Re: mdadm and 2.4 kernel?
On Thursday May 25, [EMAIL PROTECTED] wrote: Hi, for various reasons i'll need to run mdadm on a 2.4 kernel. Now I have 2.4.32 kernel. Take a look: [EMAIL PROTECTED]:~# mdadm --create --verbose /dev/md0 --level=1 --bitmap=/root/md0bitmap -n 2 /dev/nda /dev/ndb --force --assume-clean mdadm: /dev/nda appears to be part of a raid array: level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006 mdadm: /dev/ndb appears to be part of a raid array: level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006 mdadm: size set to 39118144K Continue creating array? y mdadm: Warning - bitmaps created on this kernel are not portable between different architectured. Consider upgrading the Linux kernel. mdadm: Cannot set bitmap file for /dev/md0: No such device 2.4 does not support bitmaps (nor do early 2.6 kernels). [EMAIL PROTECTED]:~# mdadm --create --verbose /dev/md0 --level=1 -n 2 /dev/nda /dev/ndb --force --assume-clean mdadm: /dev/nda appears to be part of a raid array: level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006 mdadm: /dev/ndb appears to be part of a raid array: level=raid1 devices=2 ctime=Thu May 25 20:10:47 2006 mdadm: size set to 39118144K Continue creating array? y mdadm: SET_ARRAY_INFO failed for /dev/md0: File exists [EMAIL PROTECTED]:~# It seems /dev/md0 is already active somehow. Try mdadm -S /dev/md0 first. What does cat /proc/mdstat say? NeilBrown Obviously the devices /dev/nda and /dev/ndb exists (i can make fdisk on them). Can someone help me? Thanks. Stefano. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems with raid=noautodetect
On Tue, May 23, 2006 at 08:39:26AM +1000, Neil Brown wrote: Presumably you have a 'DEVICE' line in mdadm.conf too? What is it. My first guess is that it isn't listing /dev/sdd? somehow. Neil, i am seeing a lot of people that fall in this same error, and i would propose a way of avoiding this problem 1) make DEVICE partitions the default if no device line is specified. 2) deprecate the DEVICE keyword issuing a warning when it is found in the configuration file 3) introduce DEVICEFILTER or similar keyword with the same meaning at the actual DEVICE keyboard 4) optionally add an EXCLUDEDEVICE keyword with the opposite meaning. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: problems with raid=noautodetect
On Fri, May 26, 2006 at 09:53:08AM +0200, Luca Berra wrote: On Tue, May 23, 2006 at 08:39:26AM +1000, Neil Brown wrote: Presumably you have a 'DEVICE' line in mdadm.conf too? What is it. My first guess is that it isn't listing /dev/sdd? somehow. Neil, i am seeing a lot of people that fall in this same error, and i would propose a way of avoiding this problem 1) make DEVICE partitions the default if no device line is specified. oops, just read your 2.5 announce, you already did that :) 2) deprecate the DEVICE keyword issuing a warning when it is found in the configuration file 3) introduce DEVICEFILTER or similar keyword with the same meaning at the actual DEVICE keyboard 4) optionally add an EXCLUDEDEVICE keyword with the opposite meaning. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
On Thu, 25 May 2006, Craig Hollabaugh wrote: That did it! I set the partition FS Types from 'Linux' to 'Linux raid autodetect' after my last re-sync completed. Manually stopped and started the array. Things looked good, so I crossed my fingers and rebooted. The kernel found all the drives and all is happy here in Colorado. Would it make sense for the raid code to somehow warn in the log when a device in a raid set doesn't have Linux raid autodetect partition type? If this was in dmesg, would you have spotted the problem before? -- Mikael Abrahamssonemail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
I had no idea about this particular configuration requirement. None of my reading mentioned setting the partition type. I originally created the array 1/2003 and don't remember having to set it. So, yes, more debugging info in dmesg would have saved me days of resyncing/tweak/reboot/resync cycles. (I'm not complaining, just very relieved to up and running again). On Fri, 2006-05-26 at 09:57 +0200, Mikael Abrahamsson wrote: On Thu, 25 May 2006, Craig Hollabaugh wrote: That did it! I set the partition FS Types from 'Linux' to 'Linux raid autodetect' after my last re-sync completed. Manually stopped and started the array. Things looked good, so I crossed my fingers and rebooted. The kernel found all the drives and all is happy here in Colorado. Would it make sense for the raid code to somehow warn in the log when a device in a raid set doesn't have Linux raid autodetect partition type? If this was in dmesg, would you have spotted the problem before? -- Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509 Author of Embedded Linux: Hardware, Software and Interfacing www.embeddedlinuxinterfacing.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 hang on get_active_stripe
On Tue, 23 May 2006, Neil Brown wrote: I've spent all morning looking at this and while I cannot see what is happening I did find a couple of small bugs, so that is good... I've attached three patches. The first fix two small bugs (I think). The last adds some extra information to /sys/block/mdX/md/stripe_cache_active They are against 2.6.16.11. If you could apply them and if the problem recurs, report the content of stripe_cache_active several times before and after changing it, just like you did last time, that might help throw some light on the situation. i applied them against 2.6.16.18 and two days later i got my first hang... below is the stripe_cache foo. thanks -dean neemlark:~# cd /sys/block/md4/md/ neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_size 256 neemlark:/sys/block/md4/md# echo 512 stripe_cache_size neemlark:/sys/block/md4/md# cat stripe_cache_active 474 187 preread bitlist=0 delaylist=222 neemlark:/sys/block/md4/md# cat stripe_cache_active 438 222 preread bitlist=0 delaylist=72 neemlark:/sys/block/md4/md# cat stripe_cache_active 438 222 preread bitlist=0 delaylist=72 neemlark:/sys/block/md4/md# cat stripe_cache_active 469 222 preread bitlist=0 delaylist=72 neemlark:/sys/block/md4/md# cat stripe_cache_active 512 72 preread bitlist=160 delaylist=103 neemlark:/sys/block/md4/md# cat stripe_cache_active 1 0 preread bitlist=0 delaylist=0 neemlark:/sys/block/md4/md# cat stripe_cache_active 2 0 preread bitlist=0 delaylist=0 neemlark:/sys/block/md4/md# cat stripe_cache_active 0 0 preread bitlist=0 delaylist=0 neemlark:/sys/block/md4/md# cat stripe_cache_active 2 0 preread bitlist=0 delaylist=0 neemlark:/sys/block/md4/md# md4 : active raid5 sdd1[0] sde1[5](S) sdh1[4] sdg1[3] sdf1[2] sdc1[1] 1562834944 blocks level 5, 128k chunk, algorithm 2 [5/5] [U] bitmap: 10/187 pages [40KB], 1024KB chunk - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
I had no idea about this particular configuration requirement. None of just to be clear: it's not a requirement. if you want the very nice auto-assembling behavior, you need to designate the auto-assemblable partitions. but you can assemble manually without 0xfd partitions (even if that's in an initrd, for instance.) I think the current situation is good, since there is some danger of going too far. for instance, testing each partition to see whether it contains a valid superblock would be pretty crazy, right? requiring either the auto-assemble-me partition type, or explicit partitions given in a config file is a happy medium... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
On Fri, 2006-05-26 at 12:45 -0400, Mark Hahn wrote: I think the current situation is good, since there is some danger of going too far. for instance, testing each partition to see whether it contains a valid superblock would be pretty crazy, right? requiring either the auto-assemble-me partition type, or explicit partitions given in a config file is a happy medium... I created my array in 1/2003, don't know versions of kernel or mdadm I was using then. In my situation over the past few days. kernel 2.4.30 kicked non-fresh kernel 2.6.11.8 kicked non-fresh kernel 2.6.18.8 didn't mention anything, just skipped my 'linux' partitions These kernels auto-assemble prior to mounting /. So the kernel doesn't consult my /etc/mdadm/mdadm.conf file. Is this correct? -- Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509 Author of Embedded Linux: Hardware, Software and Interfacing www.embeddedlinuxinterfacing.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
Mikael Abrahamsson wrote: On Thu, 25 May 2006, Craig Hollabaugh wrote: That did it! I set the partition FS Types from 'Linux' to 'Linux raid autodetect' after my last re-sync completed. Manually stopped and started the array. Things looked good, so I crossed my fingers and rebooted. The kernel found all the drives and all is happy here in Colorado. Would it make sense for the raid code to somehow warn in the log when a device in a raid set doesn't have Linux raid autodetect partition type? If this was in dmesg, would you have spotted the problem before? As long as it is written where logwatch will see it, not recognize it, and report it... People who don't read their logwatch reports get no sympathy from me. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
I created my array in 1/2003, don't know versions of kernel or mdadm I was using then. did you have /etc/*md* related config files? some distros use them to assemble during boot (not quite the same as 0xfd auto-assembly, but still pretty auto). In my situation over the past few days. kernel 2.4.30 kicked non-fresh kernel 2.6.11.8 kicked non-fresh kernel 2.6.18.8 didn't mention anything, just skipped my 'linux' partitions These kernels auto-assemble prior to mounting /. So the kernel doesn't consult my /etc/mdadm/mdadm.conf file. Is this correct? yes - the kernel traditionally doesn't, of its own accord, read files. most stuff under /etc are inputs to user-level tools that run during boot to instruct the kernel how to configure things. distros have, in the past, had boot-time scripts that would run mdadm and thus read your mdadm.conf (or the raid config files that predate mdadm...) so perhaps your observed change in behavior had to do with distro changes... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
Mikael and others, I forgot to answer your question from a previous post. Yes, if I had received a warning in dmesg, I would have spotted this problem. Or at least been pointed to something to research. When I switched to the newest kernel, I didn't even get the kicking non-fresh message, just a list of added drives. The lack of information got me even more concerned. From a user perspective, here's where the disconnect occurred for me. After the re-sync and my array was stable running with a spare, I could start it and stop it, mount it and unmount it without any issues. Whew, things are looking good, my data is safe. I thought everything was good to go. I reboot the machine and my array comes up degraded. mdadm -D reports something completely different than what it reported before the reboot. dmesg gives little clues about kernel raid build process. The disconnect for me occurs between mdadm assembling the array from userspace and the kernel auto-detecting, binding and running. I was under the impression that mdadm and the kernel assemble arrays in the same fashion. In my situation where my new drive's partition types were different, that's not quite true. Thanks for the help. Craig ps. I'm old-school here, none of my 10+ Linux hosts run logwatch, dmesg is fine for me. On Fri, 2006-05-26 at 13:32 -0400, Bill Davidsen wrote: Mikael Abrahamsson wrote: On Thu, 25 May 2006, Craig Hollabaugh wrote: That did it! I set the partition FS Types from 'Linux' to 'Linux raid autodetect' after my last re-sync completed. Manually stopped and started the array. Things looked good, so I crossed my fingers and rebooted. The kernel found all the drives and all is happy here in Colorado. Would it make sense for the raid code to somehow warn in the log when a device in a raid set doesn't have Linux raid autodetect partition type? If this was in dmesg, would you have spotted the problem before? As long as it is written where logwatch will see it, not recognize it, and report it... People who don't read their logwatch reports get no sympathy from me. -- Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509 Author of Embedded Linux: Hardware, Software and Interfacing www.embeddedlinuxinterfacing.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
On Fri, 2006-05-26 at 13:30 -0400, Mark Hahn wrote: yes - the kernel traditionally doesn't, of its own accord, read files. most stuff under /etc are inputs to user-level tools that run during boot to instruct the kernel how to configure things. distros have, in the past, had boot-time scripts that would run mdadm and thus read your mdadm.conf (or the raid config files that predate mdadm...) so perhaps your observed change in behavior had to do with distro changes... I agree. There must have been a distro change over the past 3 years concerning the array build process. I seem to remember a great concern of mine to store my mdadm.conf off-site, just in case my rootfs drive died (which it did of course). I also never set the partition types to Linux raid either. So there's been a couple changes over the years, probably more. I will say this. My 2 1TB 14 drive servers have been extremely reliable for the past 3.5 years. Occasionally, I replace the power supply but that's about it. Just for your information, a drive will drop out of the array when the power supply starts to droop. When I have a drive failure, I pull the drive, externally run a bad block test and replace if necessary. If no errors, I replace the power supply and reinsert the old drive back into the array and rebuild. This has happened about 5 times for my 2 servers over the past 3 years. -- Dr. Craig Hollabaugh, [EMAIL PROTECTED], 970 240 0509 Author of Embedded Linux: Hardware, Software and Interfacing www.embeddedlinuxinterfacing.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
On Fri, May 26, 2006 at 11:06:21AM -0600, Craig Hollabaugh wrote: On Fri, 2006-05-26 at 12:45 -0400, Mark Hahn wrote: I think the current situation is good, since there is some danger of going too far. for instance, testing each partition to see whether it contains a valid superblock would be pretty crazy, right? requiring either the auto-assemble-me partition type, or explicit partitions given in a config file is a happy medium... I created my array in 1/2003, don't know versions of kernel or mdadm I was using then. In my situation over the past few days. kernel 2.4.30 kicked non-fresh kernel 2.6.11.8 kicked non-fresh kernel 2.6.18.8 didn't mention anything, just skipped my 'linux' partitions These kernels auto-assemble prior to mounting /. So the kernel doesn't consult my /etc/mdadm/mdadm.conf file. Is this correct? i strongly believe it is not correct to let kernel auto-assemble devices kernel auto-assembly should be disable and activation should be handled by mdadm only! L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 kicks non-fresh drives
i strongly believe it is not correct to let kernel auto-assemble devices kernel auto-assembly should be disable and activation should be handled by mdadm only! it's a convenience/safety tradeoff, like so many other cases. without kernel auto-assembly, it's somewhat more annoying to boot onto MD raid, right? you are forced to put MD config stuff into your initrd, etc. I don't see why auto-assembly is such a bad thing. it means you shouldn't leave 0xfd partitions sitting around, but that's OK, since 0xfd means exactly and nothing but please autoassemble this. no worse than leaving inconsistent or erroneous stuff in your mdadm.conf or /etc/rc.d/rc.sysinit. the only argument I see against (kernel) auto-assembly is the general principle of moving things out of the kernel where possible. but that's not a hard/fast rule anyway, so... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 hang on get_active_stripe
On Friday May 26, [EMAIL PROTECTED] wrote: On Tue, 23 May 2006, Neil Brown wrote: i applied them against 2.6.16.18 and two days later i got my first hang... below is the stripe_cache foo. thanks -dean neemlark:~# cd /sys/block/md4/md/ neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 Thanks. This narrows it down quite a bit... too much infact: I can now say for sure that this cannot possible happen :-) Two things that might be helpful: 1/ Do you have any other patches on 2.6.16.18 other than the 3 I sent you? If you do I'd like to see them, just in case. 2/ The message.gz you sent earlier with the echo t /proc/sysrq-trigger trace in it didn't contain information about md4_raid5 - the controlling thread for that array. It must have missed out due to a buffer overflowing. Next time it happens, could you to get this trace again and see if you can find out what what md4_raid5 is going. Maybe do the 'echo t' several times. I think that you need a kernel recompile to make the dmesg buffer larger. Thanks for your patience - this must be very frustrating for you. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 hang on get_active_stripe
On Sat, 27 May 2006, Neil Brown wrote: On Friday May 26, [EMAIL PROTECTED] wrote: On Tue, 23 May 2006, Neil Brown wrote: i applied them against 2.6.16.18 and two days later i got my first hang... below is the stripe_cache foo. thanks -dean neemlark:~# cd /sys/block/md4/md/ neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 neemlark:/sys/block/md4/md# cat stripe_cache_active 255 0 preread bitlist=0 delaylist=255 Thanks. This narrows it down quite a bit... too much infact: I can now say for sure that this cannot possible happen :-) heheh. fwiw the box has traditionally been rock solid.. it's ancient though... dual p3 750 w/440bx chipset and pc100 ecc memory... 3ware 7508 w/seagate 400GB disks... i really don't suspect the hardware all that much because the freeze seems to be rather consistent as to time of day (overnight while i've got 3x rdiff-backup, plus bittorrent, plus updatedb going). unfortunately it doesn't happen every time... but every time i've unstuck the box i've noticed those processes going. other tidbits... md4 is a lvm2 PV ... there are two LVs, one with ext3 and one with xfs. Two things that might be helpful: 1/ Do you have any other patches on 2.6.16.18 other than the 3 I sent you? If you do I'd like to see them, just in case. it was just 2.6.16.18 plus the 3 you sent... i attached the .config (it's rather full -- based off debian kernel .config). maybe there's a compiler bug: gcc version 4.0.4 20060507 (prerelease) (Debian 4.0.3-3) 2/ The message.gz you sent earlier with the echo t /proc/sysrq-trigger trace in it didn't contain information about md4_raid5 - the controlling thread for that array. It must have missed out due to a buffer overflowing. Next time it happens, could you to get this trace again and see if you can find out what what md4_raid5 is going. Maybe do the 'echo t' several times. I think that you need a kernel recompile to make the dmesg buffer larger. ok i'll set CONFIG_LOG_BUF_SHIFT=18 and rebuild ... note that i'm going to include two more patches in this next kernel: http://lkml.org/lkml/2006/5/23/42 http://arctic.org/~dean/patches/linux-2.6.16.5-no-treason.patch the first was the Jens Axboe patch you mentioned here recently (for accounting with i/o barriers)... and the second gets rid of the tcp treason uncloaked messages. Thanks for your patience - this must be very frustrating for you. fortunately i'm the primary user of this box... and the bug doesn't corrupt anything... and i can unstick it easily :) so it's not all that frustrating actually. -dean config.gz Description: Binary data