Hi,

on raid-initialization or later on a re-sync our systems become 
unresponsive. Ping still works, ssh won't succeed until the re-sync has 
finished, on a serial or local connection one can still type, as with ssh, 
whatever you request from the system won't be done until the raid-sync is 
done.

This is with 2.6.22, but as far as I remember we also observed this with 
2.6.23. Also, the higher the stripe cache size, the higher the 
probability the system will go into this state.

The system is booted diskles over nfs, so absolutely no i/o to the disks.

[ 3017.702688] SysRq : HELP : loglevel0-8 reBoot tErm Full kIll saK showMem 
Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount 
shoW-blocked-tasks
[ 3017.742667] SysRq : Show Blocked State
[ 3017.746617]
[ 3017.746618]                                  free                        
sibling
[ 3017.755846]   task                 PC        stack   pid father child 
younger older
[ 3017.763830] md0_resync    D 000002bea0dece63     0  8909      2 (L-TLB)
[ 3017.770737]  ffff810123905ba0 0000000000000046 0000000000000000 
0000000000000000
[ 3017.778424]  0000000300000000 ffff81012467bc10 000000010009bbd1 
ffff810129e25050
[ 3017.786078]  00000000000001dc ffff81012b59f570 ffff810129e24ea0 
0000000000000000
[ 3017.793523] Call Trace:
[ 3017.796270]  [<ffffffff881ed509>] :raid456:get_active_stripe+0x459/0x540
[ 3017.803190]  [<ffffffff881f2f71>] :raid456:sync_request+0x831/0x850
[ 3017.809607]  [<ffffffff8817ba19>] :md_mod:md_do_sync+0x539/0x930
[ 3017.815745]  [<ffffffff88177fc9>] :md_mod:md_thread+0x49/0x140
[ 3017.821705]  [<ffffffff80249adc>] kthread+0x6c/0xa0
[ 3017.826712]  [<ffffffff8020a888>] child_rip+0xa/0x12
[ 3017.831793]
[ 3017.833331] md1_resync    D 000002be9f6f1c7d     0  8917      2 (L-TLB)
[ 3017.840276]  ffff810123cffba0 0000000000000046 0000000000000000 
0000000000000000
[ 3017.847955]  0000000300000000 ffff81012946c490 000000010009bbc8 
ffff810129dfdaa0
[ 3017.855721]  000000000000073b ffff81012b59e100 ffff810129dfd8f0 
0000000000000000
[ 3017.863225] Call Trace:
[ 3017.865915]  [<ffffffff881ed50e>] :raid456:get_active_stripe+0x45e/0x540
[ 3017.872946]  [<ffffffff881f2f71>] :raid456:sync_request+0x831/0x850
[ 3017.879510]  [<ffffffff8817ba19>] :md_mod:md_do_sync+0x539/0x930
[ 3017.885775]  [<ffffffff88177fc9>] :md_mod:md_thread+0x49/0x140
[ 3017.891865]  [<ffffffff80249adc>] kthread+0x6c/0xa0
[ 3017.896957]  [<ffffffff8020a888>] child_rip+0xa/0x12
[ 3017.902135]
[ 3017.903685] md2_resync    D 000002be9e4bded5     0  8925      2 (L-TLB)
[ 3017.910662]  ffff81012279dba0 0000000000000046 0000000000000000 
0000000000000000
[ 3017.918227]  0000000000000000 0000000000000000 000000010009bbc2 
ffff810129dfd3d0
[ 3017.925785]  000000000000024c ffff81012b510750 ffff810129dfd220 
0000000000000000
[ 3017.933137] Call Trace:
[ 3017.935825]  [<ffffffff881ed50e>] :raid456:get_active_stripe+0x45e/0x540
[ 3017.942613]  [<ffffffff881f2f71>] :raid456:sync_request+0x831/0x850
[ 3017.948972]  [<ffffffff8817ba19>] :md_mod:md_do_sync+0x539/0x930
[ 3017.955071]  [<ffffffff88177fc9>] :md_mod:md_thread+0x49/0x140
[ 3017.960960]  [<ffffffff80249adc>] kthread+0x6c/0xa0
[ 3017.965883]  [<ffffffff8020a888>] child_rip+0xa/0x12
[ 3017.970894]
[ 3017.972417] mcelog        D 000002bae6ba88a2     0  9005   9003 (NOTLB)
[ 3017.979169]  ffff810115b09dd8 0000000000000082 0000000000000000 
0000000000000000
[ 3017.986753]  ffff81012fd7b9e0 ffffffff80265bc5 000000010009ac27 
ffff81012a84a3f0
[ 3017.994312]  0000000000001438 ffff81012b5f8810 ffff81012a84a240 
0000000000000000
[ 3018.001671] Call Trace:
[ 3018.004341]  [<ffffffff804ed69e>] wait_for_completion+0x9e/0xf0
[ 3018.010347]  [<ffffffff8024783c>] synchronize_rcu+0x3c/0x50
[ 3018.015985]  [<ffffffff80213fb8>] mce_read+0x118/0x240
[ 3018.021189]  [<ffffffff8028e265>] vfs_read+0xb5/0x170
[ 3018.026287]  [<ffffffff8028e623>] sys_read+0x53/0x90
[ 3018.031325]  [<ffffffff80209a6e>] system_call+0x7e/0x83
[ 3018.036619]  [<00002b32d97b9cd0>]
[ 3018.039963]

Any ideas?

Thanks in advance,
Bernd


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to