Greetings,
We're having some recurring kernel panics on some Solaris 9 devices that
appear to be caused by ipfilter. Sun support has analyzed the core
dumps and claims the issue is caused by ipf. We're using version
3.4.32, is there a known issue around this, and will moving to a newer
version like 4.1.8 correct this problem?
Here's a core file debug, with some inline comments by the Sun analyst.
------------------ BEGIN DEBUG -------------------
core file: /cores/64649137/64649137vmcore.2
user: Cores User (cores:911)
release: 5.9 (64-bit)
version: Generic_112233-10
machine: sun4u
node name: npls1-dpe01
hw_provider: Sun_Microsystems
system type: SUNW,Sun-Fire-V440
hostid: >>removed<<
time of crash: Mon Jul 4 04:16:37 MDT 2005
age of system: 26 days 9 hours 18 minutes 55.11 seconds
panic CPU: 2 (4 CPUs, 16G memory)
panic string: BAD TRAP: type=31 rp=2a1004a4f70 addr=81a0000100000018
mmu_fsr=0
==== panic kernel thread: 0x2a1004a5d40 pid: 0 on CPU: 2 ====
cmd: sched
t_procp: 0x14382d8(proc_sched) p_as: 0x14381c0(kas)
t_stk: 0x2a1004a5b50 sp: 0x1437511 t_stkbase: 0x2a1004a2000
t_pri: 60(SYS) pctcpu: 0.000000 t_lwp: 0x0
psrset: 0 last CPU: 2
idle: 1 ticks (0.01 seconds)
start: Tue Jun 7 18:58:30 2005
age: 2279887 seconds (26 days 9 hours 18 minutes 7 seconds)
stime: 1906 (26 days 9 hours 18 minutes 36.05 seconds earlier)
()
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_TALLOCSTK - thread structure allocated from stk
T_PANIC - thread initiated a system panic
tpflg: none set
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
TS_SIGNALLED - thread was awakened by cv_signal()
pflag: SSYS - system resident process
SLOAD - in core
SLOCK - process cannot be swapped
pc: 0x104a464 unix:panicsys+0x44: call unix:setjmp
startpc: 0x78053cbc ce:ce_drain_fifo+0x0: save %sp, -0x280, %sp
unix:panicsys+0x44(0x1059310, 0x2a1004a4d28, 0x1437ee0, 0x1, 0x0, 0xd4,
0x8900001600, , , , , , , , 0x1059310, 0x2a1004a4d28)
unix:vpanic+0xcc(0x1059310, 0x2a1004a4d28, 0x1, 0x1, 0x8, 0x8)
unix:panic+0x1c(0x1059310, 0x31, 0x2a1004a4f70, 0x81a0000100000018, 0x0,
0x2000)
unix:die+0xa4(0x31, 0x2a1004a4f70, 0x81a0000100000018, 0x0,
0x300003c2f10, 0x30005b4fb02)
unix:trap+0x874(0x2a1004a4f70, 0x81a0000100000018, , , 0x81a00001, 0x0)
unix:ktl0+0x48()
-- trap data type: 0x31 (data access MMU miss) rp: 0x2a1004a4f70 --
addr: 0x81a0000100000018
pc: 0x13da3a0 ipf:fr_scanlist+0xe0: ldx [%l0 + 0x18], %l0
npc: 0x13da3a4 ipf:fr_scanlist+0xe4: subcc %l0, %g0, %g0 (
cmp%l0, %g0 )
global: %g1 0x4
%g2 0x1 %g3 0x8000
%g4 0x3000444de40 %g5 0x92
%g6 0 %g7 0x2a1004a5d40
out: %o0 0 %o1 0x2a1004a5330
%o2 0x11c5221 %o3 0x11c5221
%o4 0x8 %o5 0x8
%sp 0x2a1004a4811 %o7 0x13daa38
loc: %l0 0x81a0000100000000 %l1 0x300052fdbb8
%l2 0 %l3 0x42c90c85000630a4
%l4 0x30005b4fb24 %l5 0x3c
%l6 0x300082b75a0 %l7 0x1438000
in: %i0 0x202 %i1 0x30005b4fb10
%i2 0x2a1004a5330 %i3 0x30007344d40
%i4 0x3b %i5 0
%fp 0x2a1004a49f1 %i7 0x13db594
<trap>ipf:fr_scanlist+0xe0(0x202, 0x30005b4fb10, 0x2a1004a5330,
0x30007344d40, 0x3b, 0x0)
ipf:fr_check+0x6c4(0x30005b4fb10, 0x14, 0x300052fdbb8, 0x0,
0x2a1004a56a8, 0x2a1004a5810)
ipf:fr_precheck+0xddc(0x2a1004a5810, 0x300052fe2a0, 0x2a1004a56a8, 0x0,
0x0, 0xd4)
ipf:fr_qin+0x3bc(0x300052fe2a0, 0x30007344d40, 0x20, 0x1, 0x30007344d40,
0x30000244000)
unix:putnext+0x21c(0x300052fe528, 0x30007344d40, , 0x1, 0x30005b4fb10,
0xffff)
ce:ce_drain_fifo+0x52e8(0x30005b53288, 0x0)
unix:thread_start+0x4()
-- end of kernel thread's stack --
ipf:fr_scanlist+0xd0: sub %l0, 0x1, %l0 ( dec %l0 )
ipf:fr_scanlist+0xd4: ba,pt %icc, ipf:fr_scanlist+0xb64
(44f)
ipf:fr_scanlist+0xd8: stw %l0, [%fp + 0x7b7]
ipf:fr_scanlist+0xdc: 3: ldx [%fp + 0x7cf], %l0
ipf:fr_scanlist+0xe0: ldx [%l0 + 0x18], %l0 <---HERE is
where we panic
And the reason is that %l0 contains the following:
%l0 0x81a0000100000000
Notice we load it just above off of our frame pointer
0x2a1004a49f1 = 0
SolarisCAT(64649137vmcore.2)> rd 0x2a1004a49f1+0x7cf
0x2a1004a51c0 = 0x81a0000100000000
SolarisCAT(64649137vmcore.2)>
I've seen this panic before, it's an issue with "ipf" which is not our
product (only in Solaris 10 is it something we support).
SolarisCAT(64649137vmcore.2)> modinfo -p ipf
id flags modctl textaddr size cnt name
108 LIN 0x30005cfeca0 0x13d5ffe 0x29d59 1 ipf (IP Filter:
v3.4.32)
SolarisCAT(64649137vmcore.2)>
------------------ END DEBUG -------------------
Thanks for any advice or pointers to known issues!
Bill Sweeney
"The power of accurate observation is commonly called cynicism by those
who have not got it."
- George Bernard Shaw