Sun provided the following quite detailed analysis of my recent kernel
panic, which would seem caused by ipf traversing a linked list. Any
thoughts? Darren, the core file is still available if you'd like to take a
look at it.

Thanks...


Core analysis of vmcore.0:

core file:      /cores/63354516/vmcore.0
release:        5.8 (64-bit)
version:        Generic_108528-15
machine:        sun4u
node name:      karm
hw_provider:    Sun_Microsystems
system type:    SUNW,Sun-Fire-280R
hostid:         83180cda
time of crash:  Thu Jan  9 22:40:27 MST 2003
age of system:  6 days 10 hours 27 minutes 51.83 seconds
panic cpu:      1 (ncpus: 2)
panic string:   BAD TRAP: type=34 rp=2a101176390 addr=2f6465762f6d6447 mmu_fsr=0

SolarisCAT(vmcore.0)> panic
panic on cpu 1
panic string:   BAD TRAP: type=34 rp=2a101176390 addr=2f6465762f6d6447 mmu_fsr=0
==== panic user thread: 0x30004e8a420  pid: 1138  on cpu: 1 ====
cmd: /opt/CSCOpx/objects/availability/bin/avpoller -i
/opt/CSCOpx/objects/availabili

t_stk: 0x2a101177af0  sp: 0x10422cb1  t_stkbase: 0x2a101172000
t_pri: 51(TS)  t_lwp: 0x30004e86078  machpcb: 0x2a101177af0
t_procp: 0x300028c6050  p_as: 0x30001baea88  hat: 0x3000005cd60  cnum: 0x216
  size: 22503424  rss: 17752064
bound cpuid: 1  last cpuid: 1
idle: 1 ticks (0.01 seconds)
start: Fri Jan  3 12:15:48 2003
age: 555879 seconds (6 days 10 hours 24 minutes 39 seconds)
stime: 55607182 (0.01 seconds earlier)
syscall: sendto (0x1995c8)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_PANIC - thread initiated a system panic
tpflg:  TP_TWAIT - wait to be freed by lwp_wait
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
pflag:  SLOAD - in core
        SULOAD - u-block in core
        SNOWAIT - children never become zombies

pc: 0x10044704  unix:panicsys+0x44:     call    unix:setjmp

unix:panicsys+0x44 (0x10423680, 0x2a101176148, 0x10050f90, 0x78002000,
0x2a101176930, 0x0)
unix:vpanic+0xcc (0x10050f90, 0x2a101176148, 0xe, 0x1, 0x3000b779d98,
0x3000c6b73c0)
unix:panic+0x1c (0x10050f90, 0x34, 0x2a101176390, 0x2f6465762f6d6447, 0x0,
0x3000019e188)
unix:die+0xa4 (0x34, 0x2a101176390, 0x2f6465762f6d6447, 0x0, 0x2a101176390, 0x0)
unix:trap+0x5d0 (0x2f6465762f6d6447, 0x0, 0x800009, 0x10000, 0x2a101176390, 0x0)
unix:prom_rtt+0x0 (0x11, 0x4200ff11, 0x0, 0x2a101176930, 0x0, 0x0)
-- trap data  type: 0x34 (memory address not aligned)  rp: 0x2a101176390  --
              addr: 0x2f6465762f6d6447
pc:  0x7806b6a4 ipf:fr_scanlist+0xb4:     ldx     [%i5 + 0x18], %l3
npc: 0x7806b6a8 ipf:fr_scanlist+0xb8:     subcc   %l3, %g0, %g0   ( cmp   %l3,
%g0 )
  global:                       %g1                0x1
        %g2      0x2a101176950  %g3               0x20
        %g4                0x8  %g5               0x5e
        %g6                  0  %g7      0x30004e8a420
  out:  %o0               0x11  %o1         0x4200ff11
        %o2                  0  %o3      0x2a101176930
        %o4                  0  %o5                  0
        %sp      0x2a101175c31  %o7         0x78070340
  loc:  %l0                  0  %l1               0x19
        %l2         0x40000011  %l3 0x696e74722c6c6172
        %l4      0x300025e3e74  %l5      0x300025e3e48
        %l6                0x1  %l7         0x4200ff11
  in:   %i0      0x2a101176950  %i1      0x300025e5000
        %i2      0x300025e5060  %i3      0x3000c6b73c0
        %i4                0x8  %i5 0x2f6465762f6d642f
        %fp      0x2a101175e11  %i7         0x7806bdd8
<trap>ipf:fr_scanlist+0xb4 (0x2a101176950, 0x300025e5000, 0x300025e5060,
0x3000c6b73c0, 0x8, 0x2f6465762f6d642f)
ipf:fr_scanlist+0x7e8 (0x300025e7070, 0x300025e7000, 0x300025e7060, 0x4015, 0x0,
0x300025e7000)
ipf:fr_check+0x618 (0x3000b779d98, 0x3000c6b73c0, 0x0, 0x1, 0x2a101176930,
0x202)
ipf:fr_precheck+0xb8c (0x3000c6b73c0, 0x1c, 0xe, 0x1, 0x3000b779d98,
0x3000c6b73c0)
ipf:fr_qout+0x3ec (0x30001bd93f0, 0x3000c6b73c0, 0x20, 0x3000b779dac, 0xee5e932,
0x3000019e188)
unix:putnext+0x1cc (0x30001bd8a70, 0x30001b5b5f8, 0x0, 0x3000c6b73c0, 0x0, 0x0)
ip:ip_wput_ire+0x7d4 (0xf0000000, 0x0, 0x30003069818, 0x30001bd8a70,
0x30005004b38, 0x3000c6b73c0)
ip:ip_wput+0x2b8 (0x30005f94a80, 0x30003069828, 0x30005004b38, 0x30005f94a80,
0x0, 0x0)
ipf:ipf_ip_qin+0x58 (0x30005004b38, 0x3000c6b73c0, 0x20, 0x8, 0x2a101177a00,
0x2a1011779d0)
unix:putnext+0x1cc (0x30001b51f80, 0x30005007980, 0x20, 0x3000c6b73c0,
0x30001b51f88, 0x30001b51f80)
udp:udp_wput+0x5a8 (0x3000b779dac, 0xffff, 0x5e, 0xa1, 0x300030991c8,
0x3000c6b73c0)
unix:putnext+0x1cc (0x30005004d98, 0x30005007980, 0x300098de140, 0x300098de140,
0x0, 0x0)
genunix:strput+0x264 (0x0, 0x2a101177a00, 0x30005004d98, 0x4, 0x0, 0x0)
genunix:kstrputmsg+0x314 (0x3000c6c9a00, 0x0, 0x0, 0x0, 0x0, 0x30005004d98)
sockfs:sosend_dgram+0x250 (0x10, 0x300066f0e20, 0x8, 0x2a101177a00, 0x0,
0x30005009810)
sockfs:sosendmsg+0x450 (0x0, 0x20, 0x6, 0x8, 0x2a101177a00, 0x2a1011779d0)
sockfs:sendit+0x134 (0x56, 0x30005009810, 0x4, 0x8, 0x2a101177a00,
0x2a1011779d0)
sockfs:sendto+0x74 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
sockfs:sendto32+0x34 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
unix:syscall_trap32+0xa8 (0x4, 0xffbece80, 0x56, 0x0, 0x50e21c, 0x10)
-- switch to user thread's user stack --

ipf:fr_scanlist+0xb4:   3:      ldx     [%i5 + 0x18], %l3
ipf:fr_scanlist+0xb8:           subcc   %l3, %g0, %g0   ( cmp   %l3, %g0 )

!! So, where did we set %i5?

SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep ', %i5'
ipf:fr_scanlist+0x3c:           ldx     [%l1], %i5
ipf:fr_scanlist+0x880:  43:     ldx     [%i5], %i5

!! Ok, we are going down a linked list and choke when trying
!! to follow it.  How do we initially set %i5?

ipf:fr_scanlist+0x0:            save    %sp, -0x1e0, %sp
ipf:fr_scanlist+0x4:            stw     %i0, [%fp + 0x7fb]
ipf:fr_scanlist+0x8:            stx     %i1, [%fp + 0x7ef]
ipf:fr_scanlist+0xc:            stx     %i2, [%fp + 0x7e7]
ipf:fr_scanlist+0x10:           stx     %i3, [%fp + 0x7df]
ipf:fr_scanlist+0x14:           ldx     [%fp + 0x7e7], %l4
       %l4 = 0x2a101176930
ipf:fr_scanlist+0x18:           add     %l4, 0x8, %l3
ipf:fr_scanlist+0x1c:           stx     %l3, [%fp + 0x7c7]
ipf:fr_scanlist+0x20:           stw     %g0, [%fp + 0x7bf]      ( clr   [%fp +
0x7bf] )
ipf:fr_scanlist+0x24:           stw     %g0, [%fp + 0x7b7]      ( clr   [%fp +
0x7b7] )
ipf:fr_scanlist+0x28:           stw     %g0, [%fp + 0x7b3]      ( clr   [%fp +
0x7b3] )
ipf:fr_scanlist+0x2c:           stx     %g0, [%fp + 0x79f]
ipf:fr_scanlist+0x30:           lduw    [%fp + 0x7fb], %l0
ipf:fr_scanlist+0x34:           stw     %l0, [%fp + 0x7af]
ipf:fr_scanlist+0x38:           add     %l4, 0x50, %l1
       %l1 = 0x2a101176980
ipf:fr_scanlist+0x3c:           ldx     [%l1], %i5
       %i5 = 0x0

!! This is a third party driver, so we do not know what
!! structure we are dealing with, but we can still run
!! slist by taking another structure that has a pointer
!! as the first member and has something like another
!! pointer at offset 0x18.  Turns out that mblk_t will
!! work nicely.

SolarisCAT(vmcore.0)> stype mblk_t
typedef mblk_t = struct msgb { (size: 0x40 bytes)
<<-- Pointer we need
   struct msgb *b_next; (offset 0x0 bytes, size 0x8 bytes)
   struct msgb *b_prev; (offset 0x8 bytes, size 0x8 bytes)
   struct msgb *b_cont; (offset 0x10 bytes, size 0x8 bytes)
   unsigned char *b_rptr; (offset 0x18 bytes, size 0x8 bytes)
<<-- 8 bytes at offset 0x18
   unsigned char *b_wptr; (offset 0x20 bytes, size 0x8 bytes)
   struct datab *b_datap; (offset 0x28 bytes, size 0x8 bytes)
   unsigned char b_band; (offset 0x30 bytes, size 0x1 bytes)
   unsigned char b_ftflag; (offset 0x31 bytes, size 0x1 bytes)
   unsigned short b_flag; (offset 0x32 bytes, size 0x2 bytes)
   typedef queue_t = struct queue *b_queue; (offset 0x38 bytes, size 0x8 bytes)
} ;

!! But I can not run the slist because %i5 first winds
!! up being 0x0...what was passed in at %i2 as that is
!! what was stuffed into [%fp + 0x7e7].

ipf:fr_scanlist+0x7e8:          call    ipf:fr_scanlist
ipf:fr_scanlist+0x7ec:          or      %l1, %g0, %o2   ( mov   %l1, %o2 )

 frame @ 0x2a101176610(%sp:0x2a101175e11) on user thread's stack, size
0x1e0(MINFRAME+0x130)
ipf:fr_scanlist+0x7e8   call    ipf:fr_scanlist

loc:    %l0      0x300025e6e00  %l1      0x2a101176930
        %l2                  0  %l3                0x1
        %l4             0x4015  %l5      0x300025e7000
        %l6                  0  %l7      0x300025e709c
in:     %i0      0x300025e7070  %i1      0x300025e7000
        %i2      0x300025e7060  %i3             0x4015
        %i4                  0  %i5      0x300025e7000
        %fp      0x2a101175ff1  %i7 ipf:fr_check+0x618

!! Hum, the value is in %l1 which was used to pass the
!! second arg matches what we pull out of [%fp + 0x7e7].
!! So, how do we wind up with a 0x0 value in +0x3c?
!! Maybe we wrote to [%l1] or [%fp + 0x7e7]?

SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep '\[%fp + 0x7e7\]'
ipf:fr_scanlist+0xc:            stx     %i2, [%fp + 0x7e7]

SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep '\[%l1\]'
ipf:fr_scanlist+0x40:           stx     %g0, [%l1]
ipf:fr_scanlist+0x674:          stx     %l5, [%l1]

!! Well, well, well.  We possibly overwrote the value in
!! %l1 twice.  Do we branch after 0x674 to before 0xb4?

SolarisCAT(vmcore.0)> rdi -f fr_scanlist | grep 'b)'
ipf:fr_scanlist+0x894:          bne,pt  %xcc, ipf:fr_scanlist+0x98 (2b)

!! Yeppers.  So, we have two places were we possibly over
!! wrote the value in [%l1] with another value, notably
!! 0x0.  At this point, I would send the custoer to the
!! vendor of the ipf code.

SolarisCAT(vmcore.0)> modinfo -p ipf
 id flags        modctl      textaddr     size cnt name
 94 LI    0x300023b3b20    0x78068000  0x22349   1 ipf (IP Filter: v3.4.29)


-- 
Paul B. Henson  |  (909) 869-3781  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768

Reply via email to