Sun provided the following quite detailed analysis of my recent kernel
panic, which would seem caused by ipf traversing a linked list. Any
thoughts? Darren, the core file is still available if you'd like to take a
look at it.
Thanks...
Core analysis of vmcore.0:
core file: /cores/63354516/vmcore.0
release: 5.8 (64-bit)
version: Generic_108528-15
machine: sun4u
node name: karm
hw_provider: Sun_Microsystems
system type: SUNW,Sun-Fire-280R
hostid: 83180cda
time of crash: Thu Jan 9 22:40:27 MST 2003
age of system: 6 days 10 hours 27 minutes 51.83 seconds
panic cpu: 1 (ncpus: 2)
panic string: BAD TRAP: type=34 rp=2a101176390 addr=2f6465762f6d6447 mmu_fsr=0
SolarisCAT(vmcore.0)> panic
panic on cpu 1
panic string: BAD TRAP: type=34 rp=2a101176390 addr=2f6465762f6d6447 mmu_fsr=0
==== panic user thread: 0x30004e8a420 pid: 1138 on cpu: 1 ====
cmd: /opt/CSCOpx/objects/availability/bin/avpoller -i
/opt/CSCOpx/objects/availabili
t_stk: 0x2a101177af0 sp: 0x10422cb1 t_stkbase: 0x2a101172000
t_pri: 51(TS) t_lwp: 0x30004e86078 machpcb: 0x2a101177af0
t_procp: 0x300028c6050 p_as: 0x30001baea88 hat: 0x3000005cd60 cnum: 0x216
size: 22503424 rss: 17752064
bound cpuid: 1 last cpuid: 1
idle: 1 ticks (0.01 seconds)
start: Fri Jan 3 12:15:48 2003
age: 555879 seconds (6 days 10 hours 24 minutes 39 seconds)
stime: 55607182 (0.01 seconds earlier)
syscall: sendto (0x1995c8)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_PANIC - thread initiated a system panic
tpflg: TP_TWAIT - wait to be freed by lwp_wait
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
pflag: SLOAD - in core
SULOAD - u-block in core
SNOWAIT - children never become zombies
pc: 0x10044704 unix:panicsys+0x44: call unix:setjmp
unix:panicsys+0x44 (0x10423680, 0x2a101176148, 0x10050f90, 0x78002000,
0x2a101176930, 0x0)
unix:vpanic+0xcc (0x10050f90, 0x2a101176148, 0xe, 0x1, 0x3000b779d98,
0x3000c6b73c0)
unix:panic+0x1c (0x10050f90, 0x34, 0x2a101176390, 0x2f6465762f6d6447, 0x0,
0x3000019e188)
unix:die+0xa4 (0x34, 0x2a101176390, 0x2f6465762f6d6447, 0x0, 0x2a101176390, 0x0)
unix:trap+0x5d0 (0x2f6465762f6d6447, 0x0, 0x800009, 0x10000, 0x2a101176390, 0x0)
unix:prom_rtt+0x0 (0x11, 0x4200ff11, 0x0, 0x2a101176930, 0x0, 0x0)
-- trap data type: 0x34 (memory address not aligned) rp: 0x2a101176390 --
addr: 0x2f6465762f6d6447
pc: 0x7806b6a4 ipf:fr_scanlist+0xb4: ldx [%i5 + 0x18], %l3
npc: 0x7806b6a8 ipf:fr_scanlist+0xb8: subcc %l3, %g0, %g0 ( cmp %l3,
%g0 )
global: %g1 0x1
%g2 0x2a101176950 %g3 0x20
%g4 0x8 %g5 0x5e
%g6 0 %g7 0x30004e8a420
out: %o0 0x11 %o1 0x4200ff11
%o2 0 %o3 0x2a101176930
%o4 0 %o5 0
%sp 0x2a101175c31 %o7 0x78070340
loc: %l0 0 %l1 0x19
%l2 0x40000011 %l3 0x696e74722c6c6172
%l4 0x300025e3e74 %l5 0x300025e3e48
%l6 0x1 %l7 0x4200ff11
in: %i0 0x2a101176950 %i1 0x300025e5000
%i2 0x300025e5060 %i3 0x3000c6b73c0
%i4 0x8 %i5 0x2f6465762f6d642f
%fp 0x2a101175e11 %i7 0x7806bdd8
<trap>ipf:fr_scanlist+0xb4 (0x2a101176950, 0x300025e5000, 0x300025e5060,
0x3000c6b73c0, 0x8, 0x2f6465762f6d642f)
ipf:fr_scanlist+0x7e8 (0x300025e7070, 0x300025e7000, 0x300025e7060, 0x4015, 0x0,
0x300025e7000)
ipf:fr_check+0x618 (0x3000b779d98, 0x3000c6b73c0, 0x0, 0x1, 0x2a101176930,
0x202)
ipf:fr_precheck+0xb8c (0x3000c6b73c0, 0x1c, 0xe, 0x1, 0x3000b779d98,
0x3000c6b73c0)
ipf:fr_qout+0x3ec (0x30001bd93f0, 0x3000c6b73c0, 0x20, 0x3000b779dac, 0xee5e932,
0x3000019e188)
unix:putnext+0x1cc (0x30001bd8a70, 0x30001b5b5f8, 0x0, 0x3000c6b73c0, 0x0, 0x0)
ip:ip_wput_ire+0x7d4 (0xf0000000, 0x0, 0x30003069818, 0x30001bd8a70,
0x30005004b38, 0x3000c6b73c0)
ip:ip_wput+0x2b8 (0x30005f94a80, 0x30003069828, 0x30005004b38, 0x30005f94a80,
0x0, 0x0)
ipf:ipf_ip_qin+0x58 (0x30005004b38, 0x3000c6b73c0, 0x20, 0x8, 0x2a101177a00,
0x2a1011779d0)
unix:putnext+0x1cc (0x30001b51f80, 0x30005007980, 0x20, 0x3000c6b73c0,
0x30001b51f88, 0x30001b51f80)
udp:udp_wput+0x5a8 (0x3000b779dac, 0xffff, 0x5e, 0xa1, 0x300030991c8,
0x3000c6b73c0)
unix:putnext+0x1cc (0x30005004d98, 0x30005007980, 0x300098de140, 0x300098de140,
0x0, 0x0)
genunix:strput+0x264 (0x0, 0x2a101177a00, 0x30005004d98, 0x4, 0x0, 0x0)
genunix:kstrputmsg+0x314 (0x3000c6c9a00, 0x0, 0x0, 0x0, 0x0, 0x30005004d98)
sockfs:sosend_dgram+0x250 (0x10, 0x300066f0e20, 0x8, 0x2a101177a00, 0x0,
0x30005009810)
sockfs:sosendmsg+0x450 (0x0, 0x20, 0x6, 0x8, 0x2a101177a00, 0x2a1011779d0)
sockfs:sendit+0x134 (0x56, 0x30005009810, 0x4, 0x8, 0x2a101177a00,
0x2a1011779d0)
sockfs:sendto+0x74 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
sockfs:sendto32+0x34 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
unix:syscall_trap32+0xa8 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
-- switch to user thread's user stack --
ipf:fr_scanlist+0xb4: 3: ldx [%i5 + 0x18], %l3
ipf:fr_scanlist+0xb8: subcc %l3, %g0, %g0 ( cmp %l3, %g0 )
!! So, where did we set %i5?
SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep ', %i5'
ipf:fr_scanlist+0x3c: ldx [%l1], %i5
ipf:fr_scanlist+0x880: 43: ldx [%i5], %i5
!! Ok, we are going down a linked list and choke when trying
!! to follow it. How do we initially set %i5?
ipf:fr_scanlist+0x0: save %sp, -0x1e0, %sp
ipf:fr_scanlist+0x4: stw %i0, [%fp + 0x7fb]
ipf:fr_scanlist+0x8: stx %i1, [%fp + 0x7ef]
ipf:fr_scanlist+0xc: stx %i2, [%fp + 0x7e7]
ipf:fr_scanlist+0x10: stx %i3, [%fp + 0x7df]
ipf:fr_scanlist+0x14: ldx [%fp + 0x7e7], %l4
%l4 = 0x2a101176930
ipf:fr_scanlist+0x18: add %l4, 0x8, %l3
ipf:fr_scanlist+0x1c: stx %l3, [%fp + 0x7c7]
ipf:fr_scanlist+0x20: stw %g0, [%fp + 0x7bf] ( clr [%fp +
0x7bf] )
ipf:fr_scanlist+0x24: stw %g0, [%fp + 0x7b7] ( clr [%fp +
0x7b7] )
ipf:fr_scanlist+0x28: stw %g0, [%fp + 0x7b3] ( clr [%fp +
0x7b3] )
ipf:fr_scanlist+0x2c: stx %g0, [%fp + 0x79f]
ipf:fr_scanlist+0x30: lduw [%fp + 0x7fb], %l0
ipf:fr_scanlist+0x34: stw %l0, [%fp + 0x7af]
ipf:fr_scanlist+0x38: add %l4, 0x50, %l1
%l1 = 0x2a101176980
ipf:fr_scanlist+0x3c: ldx [%l1], %i5
%i5 = 0x0
!! This is a third party driver, so we do not know what
!! structure we are dealing with, but we can still run
!! slist by taking another structure that has a pointer
!! as the first member and has something like another
!! pointer at offset 0x18. Turns out that mblk_t will
!! work nicely.
SolarisCAT(vmcore.0)> stype mblk_t
typedef mblk_t = struct msgb { (size: 0x40 bytes)
<<-- Pointer we need
struct msgb *b_next; (offset 0x0 bytes, size 0x8 bytes)
struct msgb *b_prev; (offset 0x8 bytes, size 0x8 bytes)
struct msgb *b_cont; (offset 0x10 bytes, size 0x8 bytes)
unsigned char *b_rptr; (offset 0x18 bytes, size 0x8 bytes)
<<-- 8 bytes at offset 0x18
unsigned char *b_wptr; (offset 0x20 bytes, size 0x8 bytes)
struct datab *b_datap; (offset 0x28 bytes, size 0x8 bytes)
unsigned char b_band; (offset 0x30 bytes, size 0x1 bytes)
unsigned char b_ftflag; (offset 0x31 bytes, size 0x1 bytes)
unsigned short b_flag; (offset 0x32 bytes, size 0x2 bytes)
typedef queue_t = struct queue *b_queue; (offset 0x38 bytes, size 0x8 bytes)
} ;
!! But I can not run the slist because %i5 first winds
!! up being 0x0...what was passed in at %i2 as that is
!! what was stuffed into [%fp + 0x7e7].
ipf:fr_scanlist+0x7e8: call ipf:fr_scanlist
ipf:fr_scanlist+0x7ec: or %l1, %g0, %o2 ( mov %l1, %o2 )
frame @ 0x2a101176610(%sp:0x2a101175e11) on user thread's stack, size
0x1e0(MINFRAME+0x130)
ipf:fr_scanlist+0x7e8 call ipf:fr_scanlist
loc: %l0 0x300025e6e00 %l1 0x2a101176930
%l2 0 %l3 0x1
%l4 0x4015 %l5 0x300025e7000
%l6 0 %l7 0x300025e709c
in: %i0 0x300025e7070 %i1 0x300025e7000
%i2 0x300025e7060 %i3 0x4015
%i4 0 %i5 0x300025e7000
%fp 0x2a101175ff1 %i7 ipf:fr_check+0x618
!! Hum, the value is in %l1 which was used to pass the
!! second arg matches what we pull out of [%fp + 0x7e7].
!! So, how do we wind up with a 0x0 value in +0x3c?
!! Maybe we wrote to [%l1] or [%fp + 0x7e7]?
SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep '\[%fp + 0x7e7\]'
ipf:fr_scanlist+0xc: stx %i2, [%fp + 0x7e7]
SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep '\[%l1\]'
ipf:fr_scanlist+0x40: stx %g0, [%l1]
ipf:fr_scanlist+0x674: stx %l5, [%l1]
!! Well, well, well. We possibly overwrote the value in
!! %l1 twice. Do we branch after 0x674 to before 0xb4?
SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep 'b)'
ipf:fr_scanlist+0x894: bne,pt %xcc, ipf:fr_scanlist+0x98 (2b)
!! Yeppers. So, we have two places were we possibly over
!! wrote the value in [%l1] with another value, notably
!! 0x0. At this point, I would send the custoer to the
!! vendor of the ipf code.
SolarisCAT(vmcore.0)> modinfo -p ipf
id flags modctl textaddr size cnt name
94 LI 0x300023b3b20 0x78068000 0x22349 1 ipf (IP Filter: v3.4.29)
--
Paul B. Henson | (909) 869-3781 | http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst | [EMAIL PROTECTED]
California State Polytechnic University | Pomona CA 91768