Re: mdadm and 2.4 kernel?

2006-05-26 Thread Neil Brown
On Thursday May 25, [EMAIL PROTECTED] wrote:
 Hi, for various reasons i'll need to run mdadm on a 2.4 kernel.
 Now I have 2.4.32 kernel.
 
 Take a look:
 
 [EMAIL PROTECTED]:~# mdadm --create --verbose /dev/md0 --level=1 
 --bitmap=/root/md0bitmap -n 2 /dev/nda /dev/ndb --force --assume-clean
 mdadm: /dev/nda appears to be part of a raid array:
 level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006
 mdadm: /dev/ndb appears to be part of a raid array:
 level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006
 mdadm: size set to 39118144K
 Continue creating array? y
 mdadm: Warning - bitmaps created on this kernel are not portable
   between different architectured.  Consider upgrading the Linux kernel.
 mdadm: Cannot set bitmap file for /dev/md0: No such device
 

2.4 does not support bitmaps (nor do early 2.6 kernels).

 
 [EMAIL PROTECTED]:~# mdadm --create --verbose /dev/md0 --level=1  -n 2 
 /dev/nda 
 /dev/ndb --force --assume-clean
 mdadm: /dev/nda appears to be part of a raid array:
 level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006
 mdadm: /dev/ndb appears to be part of a raid array:
 level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006
 mdadm: size set to 39118144K
 Continue creating array? y
 mdadm: SET_ARRAY_INFO failed for /dev/md0: File exists
 [EMAIL PROTECTED]:~# 

It seems /dev/md0 is already active somehow.
Try
  mdadm -S /dev/md0
first.  What does cat /proc/mdstat say?

NeilBrown


 
 Obviously the devices /dev/nda and /dev/ndb exists (i can make fdisk 
 on them).
 
 Can someone help me?
 Thanks.
 Stefano.
 
 
 
 
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with raid=noautodetect

2006-05-26 Thread Luca Berra

On Tue, May 23, 2006 at 08:39:26AM +1000, Neil Brown wrote:

Presumably you have a 'DEVICE' line in mdadm.conf too?  What is it.
My first guess is that it isn't listing /dev/sdd? somehow.


Neil,
i am seeing a lot of people that fall in this same error, and i would
propose a way of avoiding this problem

1) make DEVICE partitions the default if no device line is specified.
2) deprecate the DEVICE keyword issuing a warning when it is found in
the configuration file
3) introduce DEVICEFILTER or similar keyword with the same meaning at
the actual DEVICE keyboard
4) optionally add an EXCLUDEDEVICE keyword with the opposite meaning.

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: problems with raid=noautodetect

2006-05-26 Thread Luca Berra

On Fri, May 26, 2006 at 09:53:08AM +0200, Luca Berra wrote:

On Tue, May 23, 2006 at 08:39:26AM +1000, Neil Brown wrote:

Presumably you have a 'DEVICE' line in mdadm.conf too?  What is it.
My first guess is that it isn't listing /dev/sdd? somehow.


Neil,
i am seeing a lot of people that fall in this same error, and i would
propose a way of avoiding this problem

1) make DEVICE partitions the default if no device line is specified.

oops,
just read your 2.5 announce, you already did that :)

2) deprecate the DEVICE keyword issuing a warning when it is found in
the configuration file
3) introduce DEVICEFILTER or similar keyword with the same meaning at
the actual DEVICE keyboard
4) optionally add an EXCLUDEDEVICE keyword with the opposite meaning.



--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Mikael Abrahamsson

On Thu, 25 May 2006, Craig Hollabaugh wrote:

That did it! I set the partition FS Types from 'Linux' to 'Linux raid 
autodetect' after my last re-sync completed. Manually stopped and 
started the array. Things looked good, so I crossed my fingers and 
rebooted. The kernel found all the drives and all is happy here in 
Colorado.


Would it make sense for the raid code to somehow warn in the log when a 
device in a raid set doesn't have Linux raid autodetect partition type? 
If this was in dmesg, would you have spotted the problem before?


--
Mikael Abrahamssonemail: [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Craig Hollabaugh
I had no idea about this particular configuration requirement. None of
my reading mentioned setting the partition type. I originally created
the array 1/2003 and don't remember having to set it. So, yes, more
debugging info in dmesg would have saved me days of
resyncing/tweak/reboot/resync cycles. (I'm not complaining, just very
relieved to up and running again).
 



On Fri, 2006-05-26 at 09:57 +0200, Mikael Abrahamsson wrote:
 On Thu, 25 May 2006, Craig Hollabaugh wrote:
 
  That did it! I set the partition FS Types from 'Linux' to 'Linux raid 
  autodetect' after my last re-sync completed. Manually stopped and 
  started the array. Things looked good, so I crossed my fingers and 
  rebooted. The kernel found all the drives and all is happy here in 
  Colorado.
 
 Would it make sense for the raid code to somehow warn in the log when a 
 device in a raid set doesn't have Linux raid autodetect partition type? 
 If this was in dmesg, would you have spotted the problem before?
 
-- 

Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 hang on get_active_stripe

2006-05-26 Thread dean gaudet
On Tue, 23 May 2006, Neil Brown wrote:

 I've spent all morning looking at this and while I cannot see what is
 happening I did find a couple of small bugs, so that is good...
 
 I've attached three patches.  The first fix two small bugs (I think).
 The last adds some extra information to
   /sys/block/mdX/md/stripe_cache_active
 
 They are against 2.6.16.11.
 
 If you could apply them and if the problem recurs, report the content
 of stripe_cache_active several times before and after changing it,
 just like you did last time, that might help throw some light on the
 situation.

i applied them against 2.6.16.18 and two days later i got my first hang... 
below is the stripe_cache foo.

thanks
-dean

neemlark:~# cd /sys/block/md4/md/
neemlark:/sys/block/md4/md# cat stripe_cache_active 
255
0 preread
bitlist=0 delaylist=255
neemlark:/sys/block/md4/md# cat stripe_cache_active 
255
0 preread
bitlist=0 delaylist=255
neemlark:/sys/block/md4/md# cat stripe_cache_active 
255
0 preread
bitlist=0 delaylist=255
neemlark:/sys/block/md4/md# cat stripe_cache_active 
255
0 preread
bitlist=0 delaylist=255
neemlark:/sys/block/md4/md# cat stripe_cache_active 
255
0 preread
bitlist=0 delaylist=255
neemlark:/sys/block/md4/md# cat stripe_cache_size 
256
neemlark:/sys/block/md4/md# echo 512 stripe_cache_size
neemlark:/sys/block/md4/md# cat stripe_cache_active
474
187 preread
bitlist=0 delaylist=222
neemlark:/sys/block/md4/md# cat stripe_cache_active
438
222 preread
bitlist=0 delaylist=72
neemlark:/sys/block/md4/md# cat stripe_cache_active
438
222 preread
bitlist=0 delaylist=72
neemlark:/sys/block/md4/md# cat stripe_cache_active
469
222 preread
bitlist=0 delaylist=72
neemlark:/sys/block/md4/md# cat stripe_cache_active
512
72 preread
bitlist=160 delaylist=103
neemlark:/sys/block/md4/md# cat stripe_cache_active
1
0 preread
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
2
0 preread
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
0
0 preread
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# cat stripe_cache_active
2
0 preread
bitlist=0 delaylist=0
neemlark:/sys/block/md4/md# 

md4 : active raid5 sdd1[0] sde1[5](S) sdh1[4] sdg1[3] sdf1[2] sdc1[1]
  1562834944 blocks level 5, 128k chunk, algorithm 2 [5/5] [U]
  bitmap: 10/187 pages [40KB], 1024KB chunk
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Mark Hahn
 I had no idea about this particular configuration requirement. None of

just to be clear: it's not a requirement.  if you want the very nice 
auto-assembling behavior, you need to designate the auto-assemblable 
partitions.  but you can assemble manually without 0xfd partitions
(even if that's in an initrd, for instance.)

I think the current situation is good, since there is some danger of 
going too far.  for instance, testing each partition to see whether 
it contains a valid superblock would be pretty crazy, right?  requiring
either the auto-assemble-me partition type, or explicit partitions 
given in a config file is a happy medium...

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Craig Hollabaugh
On Fri, 2006-05-26 at 12:45 -0400, Mark Hahn wrote:
 I think the current situation is good, since there is some danger of 
 going too far.  for instance, testing each partition to see whether 
 it contains a valid superblock would be pretty crazy, right?  requiring
 either the auto-assemble-me partition type, or explicit partitions 
 given in a config file is a happy medium...
 

I created my array in 1/2003, don't know versions of kernel or mdadm I
was using then.

In my situation over the past few days.
  kernel 2.4.30 kicked non-fresh
  kernel 2.6.11.8 kicked non-fresh
  kernel 2.6.18.8 didn't mention anything, just skipped my 'linux'
partitions

These kernels auto-assemble prior to mounting /. So the kernel doesn't
consult my 
/etc/mdadm/mdadm.conf file. Is this correct? 



 
-- 

Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Bill Davidsen

Mikael Abrahamsson wrote:


On Thu, 25 May 2006, Craig Hollabaugh wrote:

That did it! I set the partition FS Types from 'Linux' to 'Linux raid 
autodetect' after my last re-sync completed. Manually stopped and 
started the array. Things looked good, so I crossed my fingers and 
rebooted. The kernel found all the drives and all is happy here in 
Colorado.



Would it make sense for the raid code to somehow warn in the log when 
a device in a raid set doesn't have Linux raid autodetect partition 
type? If this was in dmesg, would you have spotted the problem before?


As long as it is written where logwatch will see it, not recognize it, 
and report it... People who don't read their logwatch reports get no 
sympathy from me.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Mark Hahn
 I created my array in 1/2003, don't know versions of kernel or mdadm I
 was using then.

did you have /etc/*md* related config files?  some distros use
them to assemble during boot (not quite the same as 0xfd auto-assembly,
but still pretty auto).

 In my situation over the past few days.
   kernel 2.4.30 kicked non-fresh
   kernel 2.6.11.8 kicked non-fresh
   kernel 2.6.18.8 didn't mention anything, just skipped my 'linux'
 partitions

 These kernels auto-assemble prior to mounting /. So the kernel doesn't
 consult my
 /etc/mdadm/mdadm.conf file. Is this correct?

yes - the kernel traditionally doesn't, of its own accord, read files.
most stuff under /etc are inputs to user-level tools that run during 
boot to instruct the kernel how to configure things.  distros have,
in the past, had boot-time scripts that would run mdadm and thus 
read your mdadm.conf (or the raid config files that predate mdadm...)

so perhaps your observed change in behavior had to do with distro changes...

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Craig Hollabaugh
Mikael and others,
I forgot to answer your question from a previous post. Yes, if I had
received a warning in dmesg, I would have spotted this problem. Or at
least been pointed to something to research. When I switched to the
newest kernel, I didn't even get the kicking non-fresh message, just a
list of added drives. The lack of information got me even more
concerned. 

From a user perspective, here's where the disconnect occurred for me.
After the re-sync and my array was stable running with a spare, I could
start it and stop it, mount it and unmount it without any issues. Whew,
things are looking good, my data is safe. I thought everything was good
to go. I reboot the machine and my array comes up degraded. mdadm -D
reports something completely different than what it reported before the
reboot. dmesg gives little clues about kernel raid build process. 

The disconnect for me occurs between mdadm assembling the array from
userspace and the kernel auto-detecting, binding and running. I was
under the impression that mdadm and the kernel assemble arrays in the
same fashion. In my situation where my new drive's partition types were
different, that's not quite true. 

Thanks for the help.
Craig
ps. I'm old-school here, none of my 10+ Linux hosts run logwatch, dmesg
is fine for me.



On Fri, 2006-05-26 at 13:32 -0400, Bill Davidsen wrote:
 Mikael Abrahamsson wrote:
 
  On Thu, 25 May 2006, Craig Hollabaugh wrote:
 
  That did it! I set the partition FS Types from 'Linux' to 'Linux raid 
  autodetect' after my last re-sync completed. Manually stopped and 
  started the array. Things looked good, so I crossed my fingers and 
  rebooted. The kernel found all the drives and all is happy here in 
  Colorado.
 
 
  Would it make sense for the raid code to somehow warn in the log when 
  a device in a raid set doesn't have Linux raid autodetect partition 
  type? If this was in dmesg, would you have spotted the problem before?
 
 As long as it is written where logwatch will see it, not recognize it, 
 and report it... People who don't read their logwatch reports get no 
 sympathy from me.
 
-- 

Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Craig Hollabaugh
On Fri, 2006-05-26 at 13:30 -0400, Mark Hahn wrote:
 yes - the kernel traditionally doesn't, of its own accord, read files.
 most stuff under /etc are inputs to user-level tools that run during 
 boot to instruct the kernel how to configure things.  distros have,
 in the past, had boot-time scripts that would run mdadm and thus 
 read your mdadm.conf (or the raid config files that predate mdadm...)
 
 so perhaps your observed change in behavior had to do with distro changes...


I agree. There must have been a distro change over the past 3 years
concerning the array build process. I seem to remember a great concern
of mine to store my mdadm.conf off-site, just in case my rootfs drive
died (which it did of course). I also never set the partition types to
Linux raid either. So there's been a couple changes over the years,
probably more. 

I will say this. My 2 1TB 14 drive servers have been extremely reliable
for the past 3.5 years. Occasionally, I replace the power supply but
that's about it. Just for your information, a drive will drop out of the
array when the power supply starts to droop. When I have a drive
failure, I pull the drive, externally run a bad block test and replace
if necessary. If no errors, I replace the power supply and reinsert the
old drive back into the array and rebuild. This has happened about 5
times for my 2 servers over the past 3 years. 






 
 
-- 

Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Luca Berra

On Fri, May 26, 2006 at 11:06:21AM -0600, Craig Hollabaugh wrote:

On Fri, 2006-05-26 at 12:45 -0400, Mark Hahn wrote:
I think the current situation is good, since there is some danger of 
going too far.  for instance, testing each partition to see whether 
it contains a valid superblock would be pretty crazy, right?  requiring
either the auto-assemble-me partition type, or explicit partitions 
given in a config file is a happy medium...




I created my array in 1/2003, don't know versions of kernel or mdadm I
was using then.

In my situation over the past few days.
 kernel 2.4.30 kicked non-fresh
 kernel 2.6.11.8 kicked non-fresh
 kernel 2.6.18.8 didn't mention anything, just skipped my 'linux'
partitions

These kernels auto-assemble prior to mounting /. So the kernel doesn't
consult my 
/etc/mdadm/mdadm.conf file. Is this correct? 

i strongly believe it is not correct to let kernel auto-assemble devices
kernel auto-assembly should be disable and activation should be handled
by mdadm only!

L.

--
Luca Berra -- [EMAIL PROTECTED]
   Communication Media  Services S.r.l.
/\
\ / ASCII RIBBON CAMPAIGN
 XAGAINST HTML MAIL
/ \
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 kicks non-fresh drives

2006-05-26 Thread Mark Hahn
 i strongly believe it is not correct to let kernel auto-assemble devices
 kernel auto-assembly should be disable and activation should be handled
 by mdadm only!

it's a convenience/safety tradeoff, like so many other cases.
without kernel auto-assembly, it's somewhat more annoying to 
boot onto MD raid, right?  you are forced to put MD config stuff
into your initrd, etc.

I don't see why auto-assembly is such a bad thing.  it means you 
shouldn't leave 0xfd partitions sitting around, but that's OK,
since 0xfd means exactly and nothing but please autoassemble this.
no worse than leaving inconsistent or erroneous stuff in your 
mdadm.conf or /etc/rc.d/rc.sysinit.

the only argument I see against (kernel) auto-assembly is the 
general principle of moving things out of the kernel where possible.
but that's not a hard/fast rule anyway, so...

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 hang on get_active_stripe

2006-05-26 Thread Neil Brown
On Friday May 26, [EMAIL PROTECTED] wrote:
 On Tue, 23 May 2006, Neil Brown wrote:
 
 i applied them against 2.6.16.18 and two days later i got my first hang... 
 below is the stripe_cache foo.
 
 thanks
 -dean
 
 neemlark:~# cd /sys/block/md4/md/
 neemlark:/sys/block/md4/md# cat stripe_cache_active 
 255
 0 preread
 bitlist=0 delaylist=255
 neemlark:/sys/block/md4/md# cat stripe_cache_active 
 255
 0 preread
 bitlist=0 delaylist=255
 neemlark:/sys/block/md4/md# cat stripe_cache_active 
 255
 0 preread
 bitlist=0 delaylist=255

Thanks.  This narrows it down quite a bit... too much infact:  I can
now say for sure that this cannot possible happen :-)

Two things that might be helpful:
  1/ Do you have any other patches on 2.6.16.18 other than the 3 I
sent you?  If you do I'd like to see them, just in case.
  2/ The message.gz you sent earlier with the
  echo t  /proc/sysrq-trigger
 trace in it didn't contain information about md4_raid5 - the 
 controlling thread for that array.  It must have missed out
 due to a buffer overflowing.  Next time it happens, could you
 to get this trace again and see if you can find out what
 what md4_raid5 is going.  Maybe do the 'echo t' several times.
 I think that you need a kernel recompile to make the dmesg
 buffer larger.

Thanks for your patience - this must be very frustrating for you.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 hang on get_active_stripe

2006-05-26 Thread dean gaudet
On Sat, 27 May 2006, Neil Brown wrote:

 On Friday May 26, [EMAIL PROTECTED] wrote:
  On Tue, 23 May 2006, Neil Brown wrote:
  
  i applied them against 2.6.16.18 and two days later i got my first hang... 
  below is the stripe_cache foo.
  
  thanks
  -dean
  
  neemlark:~# cd /sys/block/md4/md/
  neemlark:/sys/block/md4/md# cat stripe_cache_active 
  255
  0 preread
  bitlist=0 delaylist=255
  neemlark:/sys/block/md4/md# cat stripe_cache_active 
  255
  0 preread
  bitlist=0 delaylist=255
  neemlark:/sys/block/md4/md# cat stripe_cache_active 
  255
  0 preread
  bitlist=0 delaylist=255
 
 Thanks.  This narrows it down quite a bit... too much infact:  I can
 now say for sure that this cannot possible happen :-)

heheh.  fwiw the box has traditionally been rock solid.. it's ancient 
though... dual p3 750 w/440bx chipset and pc100 ecc memory... 3ware 7508 
w/seagate 400GB disks... i really don't suspect the hardware all that much 
because the freeze seems to be rather consistent as to time of day 
(overnight while i've got 3x rdiff-backup, plus bittorrent, plus updatedb 
going).  unfortunately it doesn't happen every time... but every time i've 
unstuck the box i've noticed those processes going.

other tidbits... md4 is a lvm2 PV ... there are two LVs, one with ext3
and one with xfs.


 Two things that might be helpful:
   1/ Do you have any other patches on 2.6.16.18 other than the 3 I
 sent you?  If you do I'd like to see them, just in case.

it was just 2.6.16.18 plus the 3 you sent... i attached the .config
(it's rather full -- based off debian kernel .config).

maybe there's a compiler bug:

gcc version 4.0.4 20060507 (prerelease) (Debian 4.0.3-3)


   2/ The message.gz you sent earlier with the
   echo t  /proc/sysrq-trigger
  trace in it didn't contain information about md4_raid5 - the 
  controlling thread for that array.  It must have missed out
  due to a buffer overflowing.  Next time it happens, could you
  to get this trace again and see if you can find out what
  what md4_raid5 is going.  Maybe do the 'echo t' several times.
  I think that you need a kernel recompile to make the dmesg
  buffer larger.

ok i'll set CONFIG_LOG_BUF_SHIFT=18 and rebuild ...

note that i'm going to include two more patches in this next kernel:

http://lkml.org/lkml/2006/5/23/42
http://arctic.org/~dean/patches/linux-2.6.16.5-no-treason.patch

the first was the Jens Axboe patch you mentioned here recently (for
accounting with i/o barriers)... and the second gets rid of the tcp
treason uncloaked messages.


 Thanks for your patience - this must be very frustrating for you.

fortunately i'm the primary user of this box... and the bug doesn't
corrupt anything... and i can unstick it easily :)  so it's not all that
frustrating actually.

-dean

config.gz
Description: Binary data