Problem in creating RAID5 MD array with kernel 2.6.15
Hi Folks, I am trying to create RAID5 array using mdadm on kernel 2.6.15 as # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5 -n4 /dev/sd{a,b,c,d} But when I execute this command getting the following error: mdadm: RUN_ARRAY failed: Cannot allocate memory # dmesg shows Apr 11 00:39:40 localhost kernel: md: bindsda Apr 11 00:39:40 localhost kernel: md: bindsdb Apr 11 00:39:40 localhost kernel: md: bindsdc Apr 11 00:39:40 localhost kernel: raid5: automatically using best checksumming function: generic_sse Apr 11 00:39:40 localhost kernel:generic_sse: 4111.000 MB/sec Apr 11 00:39:40 localhost kernel: raid5: using function: generic_sse (4111.000 MB/sec) Apr 11 00:39:40 localhost kernel: md: raid5 personality registered as nr 4 Apr 11 00:39:41 localhost kernel: md0: bitmap file is out of date (0 1) -- forcing full recovery Apr 11 00:39:41 localhost kernel: md0: failed to create bitmap (-12) Apr 11 00:39:41 localhost kernel: md: pers-run() failed ... Apr 11 00:39:41 localhost kernel: md: md0 stopped. Apr 11 00:39:41 localhost kernel: md: unbindsdc Apr 11 00:39:41 localhost kernel: md: export_rdev(sdc) Apr 11 00:39:41 localhost kernel: md: unbindsdb Apr 11 00:39:41 localhost kernel: md: export_rdev(sdb) Apr 11 00:39:41 localhost kernel: md: unbindsda Apr 11 00:39:41 localhost kernel: md: export_rdev(sda) I am not able to get why mdadm is failing. Please suggest some pointers so that I can solve this problem. Thanks, Yogesh Pahilwan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem in creating RAID5 MD array with kernel 2.6.15
On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Folks, I am trying to create RAID5 array using mdadm on kernel 2.6.15 as # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5 -n4 /dev/sd{a,b,c,d} But when I execute this command getting the following error: mdadm: RUN_ARRAY failed: Cannot allocate memory How big are your devices? Try setting a larger bitmap chunk size --bitmap-chunk=1024 maybe. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
Hi Neil, I have set --bitmap-chunk=1024 and RAID5 gets created successfully. But why I will have to set --bitmap-chunk for big size devices such as 500GB each in my case? What is the default value of --bitmap-chunk? Thanks, Yogesh -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 12:37 PM To: Yogesh Pahilwan Cc: linux-raid@vger.kernel.org Subject: Re: Problem in creating RAID5 MD array with kernel 2.6.15 On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Folks, I am trying to create RAID5 array using mdadm on kernel 2.6.15 as # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt -l5 -n4 /dev/sd{a,b,c,d} But when I execute this command getting the following error: mdadm: RUN_ARRAY failed: Cannot allocate memory How big are your devices? Try setting a larger bitmap chunk size --bitmap-chunk=1024 maybe. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, I have set --bitmap-chunk=1024 and RAID5 gets created successfully. Good. But why I will have to set --bitmap-chunk for big size devices such as 500GB each in my case? What is the default value of --bitmap-chunk? 4, which is probably too low. For every 2048 chunks, md potentially needs to allocate one page. md also needs to allocate a table to hold all these pages. At a chunk size of 4K, your 500GB would use 125million chunks. That's 64000 pages - but these are only allocated on demand, and we can survive failure. However the table would need 4 bytes per page, or 250K Allocating a 250K stable is unlikely to succeed due to memory fragmentation. With 1024K chunks you only need 1K, which is easy. You could safely go down to 256K chunks but I'm not sure it would gain much. I have put a note on my mdadm todo list to choose a more sensible default chunk size which limits the number of chunks to 2million. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
Hi Neil, Actually I want to calculate the performance of a RAID5 MD array in rebuild state. For doing this I do the following steps: # mdadm -f /dev/md0 /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm -r /dev/md0 /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy # tail -f /var/log/messages shows Apr 11 01:48:11 localhost kernel: 1raid5: Disk failure on sda, disabling device. Operation continuing on 3 devices Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10 Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041 Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3 Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from md0 . I am not getting why I am not able to hot remove /dev/sda from /dev/md0? # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 11 01:47:20 2006 Raid Level : raid5 Array Size : 1465159488 (1397.29 GiB 1500.32 GB) Device Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : /tmp/bitmap.txt Update Time : Tue Apr 11 01:47:20 2006 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771 Events : 0.1 Number Major Minor RaidDevice State 0 800 faulty spare rebuilding /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd This output shows that RAID5 /dev/md0 is in the degraded mode? How should I rebuild this RAID5 so that I can calculate I/O performance while rebuilding RAID5 MD Array? Thanks, Yogesh -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 1:01 PM To: Yogesh Pahilwan Cc: linux-raid@vger.kernel.org Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15 On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, I have set --bitmap-chunk=1024 and RAID5 gets created successfully. Good. But why I will have to set --bitmap-chunk for big size devices such as 500GB each in my case? What is the default value of --bitmap-chunk? 4, which is probably too low. For every 2048 chunks, md potentially needs to allocate one page. md also needs to allocate a table to hold all these pages. At a chunk size of 4K, your 500GB would use 125million chunks. That's 64000 pages - but these are only allocated on demand, and we can survive failure. However the table would need 4 bytes per page, or 250K Allocating a 250K stable is unlikely to succeed due to memory fragmentation. With 1024K chunks you only need 1K, which is easy. You could safely go down to 256K chunks but I'm not sure it would gain much. I have put a note on my mdadm todo list to choose a more sensible default chunk size which limits the number of chunks to 2million. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, Actually I want to calculate the performance of a RAID5 MD array in rebuild state. For doing this I do the following steps: # mdadm -f /dev/md0 /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm -r /dev/md0 /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy Hmmm That shouldn't happen. I think you have found a bug :-( However I cannot trivially reproduce it. - Can you reproduce this behaviour (mdadm -r failing) ? - If so, can you list the steps? - Can you reproduce on 2.6.16? - Can you reproduce it without use a bitmap? If you cannot reproduce it, please tell me as much as possible about what led up to this situation. Do you add/fail other drives? Did you create or mount a filesystem, etc. Thanks, NeilBrown # tail -f /var/log/messages shows Apr 11 01:48:11 localhost kernel: 1raid5: Disk failure on sda, disabling device. Operation continuing on 3 devices Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10 Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041 Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3 Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from md0 . I am not getting why I am not able to hot remove /dev/sda from /dev/md0? # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 11 01:47:20 2006 Raid Level : raid5 Array Size : 1465159488 (1397.29 GiB 1500.32 GB) Device Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : /tmp/bitmap.txt Update Time : Tue Apr 11 01:47:20 2006 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771 Events : 0.1 Number Major Minor RaidDevice State 0 800 faulty spare rebuilding /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd This output shows that RAID5 /dev/md0 is in the degraded mode? How should I rebuild this RAID5 so that I can calculate I/O performance while rebuilding RAID5 MD Array? Thanks, Yogesh -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 1:01 PM To: Yogesh Pahilwan Cc: linux-raid@vger.kernel.org Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15 On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, I have set --bitmap-chunk=1024 and RAID5 gets created successfully. Good. But why I will have to set --bitmap-chunk for big size devices such as 500GB each in my case? What is the default value of --bitmap-chunk? 4, which is probably too low. For every 2048 chunks, md potentially needs to allocate one page. md also needs to allocate a table to hold all these pages. At a chunk size of 4K, your 500GB would use 125million chunks. That's 64000 pages - but these are only allocated on demand, and we can survive failure. However the table would need 4 bytes per page, or 250K Allocating a 250K stable is unlikely to succeed due to memory fragmentation. With 1024K chunks you only need 1K, which is easy. You could safely go down to 256K chunks but I'm not sure it would gain much. I have put a note on my mdadm todo list to choose a more sensible default chunk size which limits the number of chunks to 2million. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Questions about: Where to find algorithms for RAID5 / RAID6
Good day. I am looking for some information, and hope the readers of this list might be able to point me in the right direction: Here is the scenario: In RAID5 ( or RAID6) when a file is written, some parity data is created, (by some form of XOR process, I assume), then that parity data is written to disk. I am looking to find the algorithm that is used to create that parity data and to decides where to place it on the disks. Any help on this is deeply appreciated. -- With our best regards, Maurice W. HilariusTelephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:[EMAIL PROTECTED] Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
Hi Neil, When I try the following steps on kernel 2.6.16 I am getting the same errors as on kernel 2.6.15. # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt --bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d} Array gets created successfully. # mdadm /dev/md0 -f /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm /dev/md0 -r /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy # cat /proc/mdstat shows Personalities : [raid5] [raid4] md0 : active raid5 sdd[3] sdc[2] sdb[1] sda[0](F) 1465159488 blocks level 5, 64k chunk, algorithm 2 [4/3] [_UUU] bitmap: 0/233 pages [0KB], 1024KB chunk, file: /tmp/bitmap.txt unused devices: none # mdadm -D /dev/md0 shows /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 11 03:39:28 2006 Raid Level : raid5 Array Size : 1465159488 (1397.29 GiB 1500.32 GB) Device Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : /tmp/bitmap.txt Update Time : Tue Apr 11 03:39:28 2006 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : ee25e1fa:d1ee78d5:d63869c6:dd7cff82 Events : 0.1 Number Major Minor RaidDevice State 0 800 faulty spare rebuilding /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd # tail -f /var/log/messages shows Apr 11 03:40:47 localhost kernel: 4md: cannot remove active disk sda from md0 ... Apr 11 03:41:29 localhost kernel: raid5: Disk failure on sda, disabling device. Operation continuing on 3 devices Apr 11 03:41:35 localhost kernel: md: cannot remove active disk sda from md0 ... -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 1:52 PM To: Yogesh Pahilwan Cc: linux-raid@vger.kernel.org Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15 On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, Actually I want to calculate the performance of a RAID5 MD array in rebuild state. For doing this I do the following steps: # mdadm -f /dev/md0 /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm -r /dev/md0 /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy Hmmm That shouldn't happen. I think you have found a bug :-( However I cannot trivially reproduce it. - Can you reproduce this behaviour (mdadm -r failing) ? - If so, can you list the steps? - Can you reproduce on 2.6.16? - Can you reproduce it without use a bitmap? If you cannot reproduce it, please tell me as much as possible about what led up to this situation. Do you add/fail other drives? Did you create or mount a filesystem, etc. Thanks, NeilBrown # tail -f /var/log/messages shows Apr 11 01:48:11 localhost kernel: 1raid5: Disk failure on sda, disabling device. Operation continuing on 3 devices Apr 11 01:48:24 localhost ntpd[3540]: synchronized to LOCAL(0), stratum 10 Apr 11 01:48:24 localhost ntpd[3540]: kernel time sync disabled 0041 Apr 11 01:48:26 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:49:26 localhost ntpd[3540]: synchronized to 10.8.0.8, stratum 3 Apr 11 01:50:51 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:51:58 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:54:16 localhost kernel: md: cannot remove active disk sda from md0 ... Apr 11 01:57:11 localhost kernel: md: cannot remove active disk sda from md0 . I am not getting why I am not able to hot remove /dev/sda from /dev/md0? # mdadm -D /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Apr 11 01:47:20 2006 Raid Level : raid5 Array Size : 1465159488 (1397.29 GiB 1500.32 GB) Device Size : 488386496 (465.76 GiB 500.11 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Intent Bitmap : /tmp/bitmap.txt Update Time : Tue Apr 11 01:47:20 2006 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : 5ce49b71:e6083c2a:121b9ac2:cb675771 Events : 0.1 Number Major Minor RaidDevice State 0 800 faulty spare rebuilding /dev/sda 1 8 161 active sync /dev/sdb 2 8 322 active sync /dev/sdc 3 8 483 active sync /dev/sdd This output shows that RAID5 /dev/md0 is in the degraded mode? How
RE: Problem in creating RAID5 MD array with kernel 2.6.15
On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, When I try the following steps on kernel 2.6.16 I am getting the same errors as on kernel 2.6.15. # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt --bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d} Array gets created successfully. # mdadm /dev/md0 -f /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm /dev/md0 -r /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy Thanks for testing. If I do exactly the same sequence it works perfectly. So there must be some important different between your setup and mine. - What happens if you leave off the --bitmap=/tmp/bitmap.txt --bitmap-chunk=1024 Does it then work? - What filesystem is on /tmp - Can you give me complete /var/log/messages from before you create the array until this error? Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Problem in creating RAID5 MD array with kernel 2.6.15
Hi Neil, Can you provide me details of your setup? Is there any kernel configuration that I will have to change and build my kernel with that? Thanks, Yogesh -Original Message- From: Neil Brown [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 11, 2006 3:54 PM To: Yogesh Pahilwan Cc: linux-raid@vger.kernel.org Subject: RE: Problem in creating RAID5 MD array with kernel 2.6.15 On Tuesday April 11, [EMAIL PROTECTED] wrote: Hi Neil, When I try the following steps on kernel 2.6.16 I am getting the same errors as on kernel 2.6.15. # mdadm -Cv /dev/md0 --assume-clean --force --bitmap=/tmp/bitmap.txt --bitmap-chunk=1024 -l5 -n4 /dev/sd{a,b,c,d} Array gets created successfully. # mdadm /dev/md0 -f /dev/sda mdadm: set /dev/sda faulty in /dev/md0 # mdadm /dev/md0 -r /dev/sda mdadm: hot remove failed for /dev/sda: Device or resource busy Thanks for testing. If I do exactly the same sequence it works perfectly. So there must be some important different between your setup and mine. - What happens if you leave off the --bitmap=/tmp/bitmap.txt --bitmap-chunk=1024 Does it then work? - What filesystem is on /tmp - Can you give me complete /var/log/messages from before you create the array until this error? Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm + raid1 of 2 disks and now need to add more
On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote: I have two SCSI disks on raid1. Since I have lots of reads from that raid, I want to add two more disks to this raid so that read will be faster. How should I add the new disks? Is this possible with md currently: Create a RAID-10 on the two new disks specifying one disk missing from each mirror. Then copy data over and add the two existing disks letting it resync? signature.asc Description: Digital signature
Re: mdadm + raid1 of 2 disks and now need to add more
Andy Smith wrote: On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote: I have two SCSI disks on raid1. Since I have lots of reads from that raid, I want to add two more disks to this raid so that read will be faster. How should I add the new disks? Is this possible with md currently: Create a RAID-10 on the two new disks specifying one disk missing from each mirror. Then copy data over and add the two existing disks letting it resync? Why not growing the array with 2 more disks? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm + raid1 of 2 disks and now need to add more
On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote: Andy Smith wrote: On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote: I have two SCSI disks on raid1. Since I have lots of reads from that raid, I want to add two more disks to this raid so that read will be faster. How should I add the new disks? Is this possible with md currently: Create a RAID-10 on the two new disks specifying one disk missing from each mirror. Then copy data over and add the two existing disks letting it resync? Why not growing the array with 2 more disks? Well I guess a RAID-1 of 4 disks would be slightly more redundant than a 4 disk RAID-10, but it would have half the capacity, and the read performance would be very similar, no? signature.asc Description: Digital signature
Re: linear writes to raid5
Mark Hahn (MH) writes: MH don't you mean _3_ chunk-sized writes? if so, are you actually MH asking about the case when you issue an aligned two-stripe write? MH (which might get broken into 6 64K writes, not sure, rather than MH three 2-chunk writes...) actually, yes. I'm talking about 3 requests: 2 of data and one of parity. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linear writes to raid5
Neil Brown (NB) writes: NB The raid5 code attempts to do this already, though I'm not sure how NB successful it is. I think it is fairly successful, but not completely NB successful. hmm. could you tell me what the code should I look at? NB There is a trade-off that raid5 has to make. Waiting longer can mean NB more blocks on the same stripe, and so less reads. But waiting longer NB can also increase latency which might not be good. yes, I agree. NB The thing to would be to put some tracing in to find out exactly what NB is happening for some sample workloads, and then see if anything can NB be improved. well, the simplest case I tried was this: mdadm -C /dev/md0 --level=5 --chunk=8 --raid-disks=3 ... then open /dev/md0 with O_DIRECT and send a write of 16K. it ended up, doing few writes and one read. the sequence was: 1) serving first 4K of the request - put the stripe it onto delayed list 2) serving 2nd 4KB -- again onto delayed list 3) serving 3rd 4KB -- get a full uptodate stripe, time to make the parity 3 writes are issued for stripe #0 4) raid5_unplug_device() is called because of those 3 writes it activates delayed stripe #4 5) raid5d() finds stripe #4 and issues READ ... I tend to think this isn't the most optimal way. couldn't we take current request into account somehow. something like keep delayed off the queue until current requests aren't served AND stripe cache isn't full. another similar case is when you have two processes writing to very different stripes and low-level requests they make from handle_stripe() cause delayed stripes to get activated. thanks, Alex - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
mdadm does not want to create partitions
Hi, I need help with RAID-0 array. I would not ask it here if I were able to find information what to do in my case. Ok, the problem. Our datacenter had a power failure and after that I am trying to bring up all services at one our server that uses partitioned RAID-0 array for Oracle tablespaces. I was partiotioned and fdisk sees it: # fdisk -l /dev/md1 Disk /dev/md1: 366.5 GB, 366555955200 bytes 2 heads, 4 sectors/track, 89491200 cylinders Units = cylinders of 8 * 512 = 4096 bytes Device BootStart EndBlocks Id System /dev/md1p1 1 15869142 63476566 83 Linux /dev/md1p2 15869143 31738284 63476568 83 Linux /dev/md1p3 31738285 47607426 63476568 83 Linux /dev/md1p4 47607427 89491200 1675350965 Extended /dev/md1p5 47607427 63476568 63476566 83 Linux /dev/md1p6 63476569 79345710 63476566 83 Linux /dev/md1p7 79345711 80566414 4882814 83 Linux /dev/md1p8 80566415 89491200 35699142 83 Linux but, I have: # mdadm -A /dev/md1 --auto=part8 mdadm: that --auto option not compatable with device named /dev/md1 It can be assembled only as non-partitioned, --auto=part or mdp do not work also. I need my partitions back, but I don't understand what I can do. The kernel is from RHEL-4 2.6.9-22.EL.roothugemem (custom build for RedHat 9). It was working before the power failure. -- Anton Petrusevich - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md/mdadm fails to properly run on 2.6.15 after upgrading from 2.6.11
On Mon, 10 Apr 2006, Marc L. de Bruin wrote: dean gaudet wrote: On Mon, 10 Apr 2006, Marc L. de Bruin wrote: However, all preferred minors are correct, meaning that the output is in sync with what I expected it to be from /etc/mdadm/mdadm.conf. Any other ideas? Just adding /etc/mdadm/mdadm.conf to the initrd does not seem to work, since mdrun seems to ignore it?! it seems to me mdrun /dev is about the worst thing possible to use in an initrd. :-) I guess I'll have to change to yaird asap then. I can't think of any other solid solution... yeah i've been yaird... it's not perfect -- take a look at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=351183 for a patch i use to improve the ability of a yaird initrd booting when you've moved devices or a device has failed. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
forcing a read on a known bad block
hey Neil... i've been wanting to test out the reconstruct-on-read-error code... and i've had two chances to do so, but haven't be able to force md to read the appropriate block to trigger the code. i had two disks with SMART Current_Pending_Sector 0 (which indicates pending read error) and i did SMART long self-tests to find out where the bad block was (it should show the LBA in the SMART error log)... one disk was in a raid1 -- and so it was kind of random which of the two disks would be read from if i tried to seek to that LBA and read... in theory with O_DIRECT i should have been able to randomly get the right disk, but that seems a bit clunky. unfortunately i didn't think of the O_DIRECT trick until after i'd given up and decided to just resync the whole disk proactively. the other disk was in a raid5 ... 5 disk raid5, so 20% chance of the bad block being in parity. i copied the kernel code to be sure, and sure enough the bad block was in parity... just bad luck :) so i can't force a read there any way that i know of... anyhow this made me wonder if there's some other existing trick to force such reads/reconstructions to occur... or perhaps this might be a useful future feature. on the raid5 disk i actually tried reading the LBA directly from the component device and it didn't trigger the read error, so now i'm a bit skeptical of the SMART log and/or my computation of the seek offset in the partition... but the above question is still interesting. -dean - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mdadm + raid1 of 2 disks and now need to add more
On Tue, 2006-04-11 at 20:32 +, Andy Smith wrote: On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote: Andy Smith wrote: On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote: I have two SCSI disks on raid1. Since I have lots of reads from that raid, I want to add two more disks to this raid so that read will be faster. How should I add the new disks? Is this possible with md currently: Create a RAID-10 on the two new disks specifying one disk missing from each mirror. Then copy data over and add the two existing disks letting it resync? Why not growing the array with 2 more disks? Well I guess a RAID-1 of 4 disks would be slightly more redundant than a 4 disk RAID-10, but it would have half the capacity, and the read performance would be very similar, no? raid1 of 4 will give u read performance like 1 disk; raid10 of 4 can give u read performance like aggregated 2 disks. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: forcing a read on a known bad block
hi dean, dean gaudet wrote: the other disk was in a raid5 ... 5 disk raid5, so 20% chance of the bad block being in parity. i copied the kernel code to be sure, and sure enough the bad block was in parity... just bad luck :) so i can't force a read there any way that i know of... well, for raid5 you can use 'echo repair /sys/block/mdx/md/sync_action' This does a 'simulated reconstruction' and has triggered this for me in the past. (For some reason 'check' instead of 'repair' did not, even though it should have tried to read all the blocks then, too...) That said, I have one disk in an 8-disk raid5 that says 'current pending sector 1', and another that says 'offline uncorrectable 1', and they have been doing so for months. Neither SMART extended tests or full raid5 resyncs have either failed or fixed this, so I don't know what's up with that... cheers, /Patrik signature.asc Description: OpenPGP digital signature
Re: [RAID] forcing a read on a known bad block
On Tue, 11 Apr 2006, dean gaudet wrote: anyhow this made me wonder if there's some other existing trick to force such reads/reconstructions to occur... or perhaps this might be a useful future feature. For testing RAID, what would be really nice is if there were a virtual disk device where one could simulate bad sectors (read or write), non-responsive disks, etc. It would be virtual in the same sort way that /dev/full simulates a full disk. This would be an ideal project for Xen. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: mdadm + raid1 of 2 disks and now need to add more
} -Original Message- } From: [EMAIL PROTECTED] [mailto:linux-raid- } [EMAIL PROTECTED] On Behalf Of Ming Zhang } Sent: Tuesday, April 11, 2006 6:13 PM } To: Andy Smith } Cc: linux-raid@vger.kernel.org } Subject: Re: mdadm + raid1 of 2 disks and now need to add more } } On Tue, 2006-04-11 at 20:32 +, Andy Smith wrote: } On Tue, Apr 11, 2006 at 07:25:58PM +0200, Laurent CARON wrote: } Andy Smith wrote: } On Tue, Apr 11, 2006 at 04:41:30PM +0200, Shai wrote: } I have two SCSI disks on raid1. } Since I have lots of reads from that raid, I want to add two more } disks to this raid so that read will be faster. } } How should I add the new disks? } } Is this possible with md currently: } } Create a RAID-10 on the two new disks specifying one disk missing } from each mirror. } } Then copy data over and add the two existing disks letting it } resync? } } Why not growing the array with 2 more disks? } } Well I guess a RAID-1 of 4 disks would be slightly more redundant } than a 4 disk RAID-10, but it would have half the capacity, and the } read performance would be very similar, no? } } raid1 of 4 will give u read performance like 1 disk; } raid10 of 4 can give u read performance like aggregated 2 disks. I know RAID1 of 4 disks will give you read performance like 4 disks. Unless your test or application is single threaded! Not likely I hope! At least with the 2.4.31 kernel. A RAID1 of 4 disks will still function with any 3 failed disks. Real good idea for remote systems. Also I think raid10 of 4 can give u read performance like 2 to 4 disks. This depends more on the application. IMO. You would have twice the space, but maybe not needed. } } } } } - } To unsubscribe from this list: send the line unsubscribe linux-raid in } the body of a message to [EMAIL PROTECTED] } More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: accessing mirrired lvm on shared storage
On Friday April 7, [EMAIL PROTECTED] wrote: Unfortunately md lacks the ability to mark an array as used/busy/you_name_it. Sometime ago I asked on this list for such an enhancement (see thread with subject Question: array locking, possible). Although I managed (with great help from few people on this list) to attract Neil's attention, I couldn't fine enough arguments to convince him to put this topic on hist TO-DO list. Neil, you see the constantly growing number of potential users of this feature? ;-) I don't think that just marking an array don't mount is really a useful solution. And if it was, it would be something done in 'mdadm' rather than in 'md'. What you really want is cluster wide locking using DLM or similar. That way when the node which has active use of the array fails, another node can pick up automatically. Then we could put a flag in the superblock which says 'shared', and md would need a special request to assemble such an array. One thing that is on my todo list is supporting shared raid1, so that several nodes in the cluster can assemble the same raid1 and access it - providing that the clients all do proper mutual exclusion as e.g. OCFS does. Your desire to have only-assembled-once would be trivial to include in that. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RAID] forcing a read on a known bad block
On Tue, Apr 11, 2006 at 12:37:53PM -1000, Julian Cowley wrote: On Tue, 11 Apr 2006, dean gaudet wrote: anyhow this made me wonder if there's some other existing trick to force such reads/reconstructions to occur... or perhaps this might be a useful future feature. For testing RAID, what would be really nice is if there were a virtual disk device where one could simulate bad sectors (read or write), non-responsive disks, etc. It would be virtual in the same sort way that /dev/full simulates a full disk. either use the MD faulty personality, or the device-mapper error target. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html