anybody has any useful tip about it?
yours.
Original Message
hi,
after we switch our servers from centos-3 to centos-4 (aka. rhel-4) one
of our server always crash once a week without any oops. this happneds
with both the normal kernel-2.6.9-11.EL and
kernel-2.6.9-11.106.unsupported. after we change the motherboard, the
raid contorller and the cables too we still got it. finally we start
netdump and last but not least yesterday we got a crash log and a core
file. it seems there is a bug in the raid5 code of the kernel.
this is our backup server with 8 x 200GB hdd in a raid5 (for the data)
plus 2 x 40GB hdd in raid1 (for the system) with 3ware 8xxx raid
contorller, running. i attached the netdump log of the last crash.
how can i fix it?
yours.
--
Levente Si vis pacem para bellum!
RAID5 conf printout:
--- rd:8 wd:8 fd:0
disk 0, o:1, dev:sda1
disk 1, o:1, dev:sdb1
disk 2, o:1, dev:sdc1
disk 3, o:1, dev:sdd1
disk 4, o:1, dev:sde1
disk 5, o:1, dev:sdf1
disk 6, o:1, dev:sdg1
disk 7, o:1, dev:sdh1
Unable to handle kernel NULL pointer dereference at virtual address
printing eip:
*pde = 0f94a067
Oops: [#1]
Modules linked in: cifs nls_utf8 ncpfs nfsd exportfs lockd sunrpc parport_pc lp
parport netconsole netdump i2c_dev i2c_core ipx dm_mod e1000 tg3 floppy ext3
jbd raid5 xor raid1 3w_ sd_mod scsi_mod
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010246 (2.6.9-11.106.unsupported)
EIP is at 0x0
eax: c1806138 ebx: c018961c ecx: 0016 edx: c035c7f4
esi: e7182200 edi: 0001 ebp: c18fb380 esp: f7878f34
ds: 007b es: 007b ss: 0068
Process md2_raid5 (pid: 224, threadinfo=f7878000 task=f7872600)
Stack: f7b973c0 f8879a26 md_thread+0x20d/0x23a
[c011ceaf] autoremove_wake_function+0x0/0x2d
[c030ce1a] ret_from_fork+0x6/0x14
[c011ceaf] autoremove_wake_function+0x0/0x2d
[c02a183f] md_thread+0x0/0x23a
[c01041d9] kernel_thread_helper+0x5/0xb
Code: Bad EIP value.
Pid: 224, comm:md2_raid5
EIP: 0060:[] CPU: 0
EIP is at 0x0
EFLAGS: 00010246Not tainted (2.6.9-11.106.unsupported)
EAX: c1806138 EBX: c018961c ECX: 0016 EDX: c035c7f4
ESI: e7182200 EDI: 0001 EBP: c18fb380 DS: 007b ES: 007b
CR0: 8005003b CR2: ffd5 CR3: 0fd6b000 CR4: 06d0
[f8879a26] handle_stripe+0xfca/0x1207 [raid5]
[f887a7d5] raid5d+0x197/0x2ab [raid5]
[c02a1a4c] md_thread+0x20d/0x23a
[c011ceaf] autoremove_wake_function+0x0/0x2d
[c030ce1a] ret_from_fork+0x6/0x14
[c011ceaf] autoremove_wake_function+0x0/0x2d
[c02a183f] md_thread+0x0/0x23a
[c01041d9] kernel_thread_helper+0x5/0xb
sibling
task PC pid father child younger older
init S C01458E9 920 1 0 2 (NOTLB)
f7f44eb0 0086 0055 xfrm_state_flush+0x2/0x289 tcp_poll+0x31/0x144
[c01768e1] do_select+0x347/0x378
[c0176461] __pollwait+0x0/0x94
[c0176c05] sys_select+0x2e0/0x43a
[c030cefb] syscall_call+0x7/0xb
ntpd S 00D0 2516 2196 1 2219 2172 (NOTLB)
f0885eb0 0082 0246 00d0 cf8553a0 21cd 5197434f 3abd
f697d2a0 f697d42c f6a0d580 7fff f0885f74 c030b7e5 f69a5980
f0885f58 f69a5980 f106ed18 c017648e 0246 f106b800 f0885f58
Call Trace:
[c030b7e5] schedule_timeout+0x50/0x10c
[c017648e] __pollwait+0x2d/0x94
[c02aeeac] datagram_poll+0x25/0xd1
[c01768e1] do_select+0x347/0x378
[c0176461] __pollwait+0x0/0x94
[c0176c05] sys_select+0x2e0/0x43a
[c01058d8] sys_sigreturn+0x1ce/0x1f2
[c030cefb] syscall_call+0x7/0xb
rpc.rquotad S 3416 2219 1 2223 2196 (NOTLB)
f65a4f1c 0082 0001 f697ccd0 00030ee2 966e2a43 0033
f69b1320 f69b14ac 7fff f65a4fa0 f0888ba0 c030b7e5 f106b580
f65a4fa0 f106e518 c02aeeac f6a7a780 c03806c0 0145 f0888bb0 0001
Call Trace:
[c030b7e5] schedule_timeout+0x50/0x10c
[c02aeeac] datagram_poll+0x25/0xd1
[c02a8f2c] sock_poll+0x12/0x14
[c0176db3] do_pollfd+0x54/0x77
[c0176e63] do_poll+0x8d/0xab
[c0177020] sys_poll+0x19f/0x24f
[c0176461] __pollwait+0x0/0x94
[c030cefb] syscall_call+0x7/0xb
nfsd S FF4DCFB0 2316 2223 1 2224 2219 (L-TLB)
f05fff10 0046 0002 ff4dcfb0 f69c4dd0 13cc c85b7bd5 37c8
f69b0d50 f69b0edc 03db8cae 03db8cae 000b c1993c00 c030b886 f5675f18
c035b0d0 03db8cae 1d244b3c 0005 c031a0b5 c031c25c 00a8
Call Trace:
[c030b886] schedule_timeout+0xf1/0x10c
[c0129336] process_timeout+0x0/0x5
[f8ade6bb] svc_recv+0x325/0x65b [sunrpc]
[c011b856] default_wake_function+0x0/0xc
[c011b921] __wake_up+0x6e/0xca
[c011b856] default_wake_function+0x0/0xc
[c012d587] sigprocmask+0x140/0x1f4
[f8b2e44d] nfsd+0x1ae/0x540 [nfsd]
[f8b2e29f] nfsd+0x0/0x540 [nfsd]
[c01041d9] kernel_thread_helper+0x5/0xb
nfsd S 37C3 3472 2224 1 2225 2223 (L-TLB)
f1513f10