Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-09 Thread Justin Piszcz
On Thu, 8 Nov 2007, Carlos Carvalho wrote: Jeff Lessem ([EMAIL PROTECTED]) wrote on 6 November 2007 22:00: Dan Williams wrote: The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-09 Thread Jeff Lessem
Dan Williams wrote: On 11/8/07, Bill Davidsen [EMAIL PROTECTED] wrote: Jeff Lessem wrote: Dan Williams wrote: The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags.

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread BERTRAND Joël
BERTRAND Joël wrote: Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread Justin Piszcz
On Thu, 8 Nov 2007, BERTRAND Joël wrote: BERTRAND Joël wrote: Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread Bill Davidsen
Jeff Lessem wrote: Dan Williams wrote: The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. I tried this patch (against a stock 2.6.23), and it did not work for

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-08 Thread Carlos Carvalho
Jeff Lessem ([EMAIL PROTECTED]) wrote on 6 November 2007 22:00: Dan Williams wrote: The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. I tried this patch

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël
Dan Williams wrote: On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote: Done. Here is obtained ouput : Much appreciated. [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread BERTRAND Joël
Chuck Ebbert wrote: On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-07 Thread Chuck Ebbert
On 11/05/2007 03:36 AM, BERTRAND Joël wrote: Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Done. Here is obtained ouput : [ 1260.967796] for sector 7629696, rmw=0 rcw=0 [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write f800ffcffcc0 written [

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Justin Piszcz
On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check 3: state 0x1 toread read

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Justin Piszcz
On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written [ 1265.941328] check

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread BERTRAND Joël
Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: On Tue, 6 Nov 2007, BERTRAND Joël wrote: Done. Here is obtained ouput : [ 1265.899068] check 4: state 0x6 toread read write f800fdd4e360 written

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Dan Williams
On Tue, 2007-11-06 at 03:19 -0700, BERTRAND Joël wrote: Done. Here is obtained ouput : Much appreciated. [ 1260.969314] handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0 [ 1260.980606] check 5: state 0x6 toread read write

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-06 Thread Jeff Lessem
Dan Williams wrote: The following patch, also attached, cleans up cases where the code looks at sh-ops.pending when it should be looking at the consistent stack-based snapshot of the operations flags. I tried this patch (against a stock 2.6.23), and it did not work for me. Not only did I/O

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread BERTRAND Joël
Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread Dan Williams
On 11/4/07, Justin Piszcz [EMAIL PROTECTED] wrote: On Mon, 5 Nov 2007, Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread Justin Piszcz
On Mon, 5 Nov 2007, Dan Williams wrote: On 11/4/07, Justin Piszcz [EMAIL PROTECTED] wrote: On Mon, 5 Nov 2007, Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-05 Thread Dan Williams
On 11/5/07, Justin Piszcz [EMAIL PROTECTED] wrote: [..] Are you seeing the same md thread takes 100% of the CPU that Joël is reporting? Yes, in another e-mail I posted the top output with md3_raid5 at 100%. This seems too similar to Joël's situation for them not to be correlated, and it

Re: 2.6.23.1: mdadm/raid5 hung/d-state (md3_raid5 stuck in endless loop?)

2007-11-04 Thread Justin Piszcz
Time to reboot, before reboot: top - 07:30:23 up 13 days, 13:33, 10 users, load average: 16.00, 15.99, 14.96 Tasks: 221 total, 7 running, 209 sleeping, 0 stopped, 5 zombie Cpu(s): 0.0%us, 25.5%sy, 0.0%ni, 74.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8039432k total, 1744356k used,

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Michael Tokarev
Justin Piszcz wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks,

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread BERTRAND Joël
Justin Piszcz wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush] After several days/weeks, this

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Justin Piszcz
On Sun, 4 Nov 2007, BERTRAND Joël wrote: Justin Piszcz wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Michael Tokarev
Justin Piszcz wrote: On Sun, 4 Nov 2007, Michael Tokarev wrote: [] The next time you come across something like that, do a SysRq-T dump and post that. It shows a stack trace of all processes - and in particular, where exactly each task is stuck. Yes I got it before I rebooted, ran that and

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread David Greaves
Michael Tokarev wrote: Justin Piszcz wrote: On Sun, 4 Nov 2007, Michael Tokarev wrote: [] The next time you come across something like that, do a SysRq-T dump and post that. It shows a stack trace of all processes - and in particular, where exactly each task is stuck. Yes I got it before

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Neil Brown
On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?DOct21 13:00 [pdflush]

Re: 2.6.23.1: mdadm/raid5 hung/d-state

2007-11-04 Thread Justin Piszcz
On Mon, 5 Nov 2007, Neil Brown wrote: On Sunday November 4, [EMAIL PROTECTED] wrote: # ps auxww | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 273 0.0 0.0 0 0 ?DOct21 14:40 [pdflush] root 274 0.0 0.0 0 0 ?