Re: + md-raid10-fix-use-after-free-of-bio.patch added to -mm tree
On Saturday July 28, [EMAIL PROTECTED] wrote: The patch titled md: raid10: fix use-after-free of bio has been added to the -mm tree. Its filename is md-raid10-fix-use-after-free-of-bio.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this -- Subject: md: raid10: fix use-after-free of bio From: Maik Hampel [EMAIL PROTECTED] In case of read errors raid10d tries to print a nice error message, unfortunately using data from an already put bio. Thanks for catching that Maik! diff -puN drivers/md/raid10.c~md-raid10-fix-use-after-free-of-bio drivers/md/raid10.c --- a/drivers/md/raid10.c~md-raid10-fix-use-after-free-of-bio +++ a/drivers/md/raid10.c @@ -1534,7 +1534,6 @@ static void raid10d(mddev_t *mddev) bio = r10_bio-devs[r10_bio-read_slot].bio; r10_bio-devs[r10_bio-read_slot].bio = mddev-ro ? IO_BLOCKED : NULL; - bio_put(bio); mirror = read_balance(conf, r10_bio); if (mirror == -1) { printk(KERN_ALERT raid10: %s: unrecoverable I/O @@ -1542,8 +1541,10 @@ static void raid10d(mddev_t *mddev) bdevname(bio-bi_bdev,b), (unsigned long long)r10_bio-sector); raid_end_bio_io(r10_bio); + bio_put(bio); and for catching that Andrew! Acked-By: NeilBrown [EMAIL PROTECTED] } else { const int do_sync = bio_sync(r10_bio-master_bio); + bio_put(bio); rdev = conf-mirrors[mirror].rdev; if (printk_ratelimit()) printk(KERN_ERR raid10: %s: redirecting sector %llu to _ Patches currently in -mm which might be from [EMAIL PROTECTED] are md-raid10-fix-use-after-free-of-bio.patch - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 resync data direction defined?
On Fri, Jul 27, 2007 at 03:07:13PM +0200, Frank van Maarseveen wrote: I'm experimenting with a live migration of /dev/sda1 using mdadm -B and network block device as in: mdadm -B -ayes -n2 -l1 /dev/md1 /dev/sda1 \ --write-mostly -b /tmp/bitm$$ --write-behind /dev/nbd1 not a good idea /dev/sda1 is to be migrated. During the migration the local system mounts from /dev/md1 instead. Stracing shows that data flows to the remote side. But when I do echo repair /sys/block/md1/md/sync_action then the data flows in the other direction: the local disk is written using data read from the remote side. I believe stracing nbd will give you a partial view of what happens. anyway in the first case since the second device is write-mostly, all data is read from local and changes are written to remote In the second one the data is read from both sides to be compared, that is what you are seing on strace, i am unsure as to which copy is considered correct, since md does not have info about that. If that would happen in the first command then it would destroy all yes data instead of migrating it so I wonder if this behavior is defined: no Do mdadm --build and mdadm --create always use the first component device on the command-line as the source for raid1 resync? no if you are doing a migration, build the initial array with the second device as missing then hot-add it and it will resync correctly i.e mdadm -B -ayes -n2 -l1 /dev/md1 /dev/sda1 \ --write-mostly -b /tmp/bitm$$ --write-behind missing mdadm -a /dev/md1 /dev/sda1 -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: md: raid10: fix use-after-free of bio
Am Samstag, den 28.07.2007, 23:55 -0700 schrieb Andrew Morton: On Fri, 27 Jul 2007 16:46:23 +0200 Maik Hampel [EMAIL PROTECTED] wrote: In case of read errors raid10d tries to print a nice error message, unfortunately using data from an already put bio. Signed-off-by: Maik Hampel [EMAIL PROTECTED] diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index f730a14..ea1b3e3 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -1557,7 +1557,6 @@ static void raid10d(mddev_t *mddev) bio = r10_bio-devs[r10_bio-read_slot].bio; r10_bio-devs[r10_bio-read_slot].bio = mddev-ro ? IO_BLOCKED : NULL; - bio_put(bio); mirror = read_balance(conf, r10_bio); if (mirror == -1) { printk(KERN_ALERT raid10: %s: unrecoverable I/O @@ -1567,6 +1566,7 @@ static void raid10d(mddev_t *mddev) raid_end_bio_io(r10_bio); } else { const int do_sync = bio_sync(r10_bio-master_bio); + bio_put(bio); rdev = conf-mirrors[mirror].rdev; if (printk_ratelimit()) printk(KERN_ERR raid10: %s: redirecting sector %llu to Surely we just leaked that bio if (mirror == -1)? better: --- a/drivers/md/raid10.c~md-raid10-fix-use-after-free-of-bio +++ a/drivers/md/raid10.c @@ -1534,7 +1534,6 @@ static void raid10d(mddev_t *mddev) bio = r10_bio-devs[r10_bio-read_slot].bio; r10_bio-devs[r10_bio-read_slot].bio = mddev-ro ? IO_BLOCKED : NULL; - bio_put(bio); mirror = read_balance(conf, r10_bio); if (mirror == -1) { printk(KERN_ALERT raid10: %s: unrecoverable I/O @@ -1542,8 +1541,10 @@ static void raid10d(mddev_t *mddev) bdevname(bio-bi_bdev,b), (unsigned long long)r10_bio-sector); raid_end_bio_io(r10_bio); + bio_put(bio); raid_end_bio_io() calls put_all_bios(), which does a bio_put() to corresponding r10_bio-devs[i]. So this looks like redundant code for me. } else { const int do_sync = bio_sync(r10_bio-master_bio); + bio_put(bio); rdev = conf-mirrors[mirror].rdev; if (printk_ratelimit()) printk(KERN_ERR raid10: %s: redirecting sector %llu to Regards, Maik Hampel - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it possible to grow a RAID-10 array with mdadm?
On Sunday July 29, [EMAIL PROTECTED] wrote: Hi everyone, Is it possible to add drives to an active RAID-10 array, using the grow switch with mdadm, just like it is possible with a RAID-5 array? Or perhaps there is another way? I have been looking for this information for a long time but have been unable to find it anywhere. The man page for mdadm does not mention RAID-10 at all so that didn't help either. Has anyone tried it? The man page for mdadm does not mention it because it is not supported. There are several reshape options that I would like to implement including - raid5 - raid6 - shrinking raid4/5/6 - raid0 - raid5 - changing chunksize/layout of raid4/5/6 - raid10 growing and layout change unfortunately I haven't yet found/made the time. Patches are always welcome :-) NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it possible to grow a RAID-10 array with mdadm?
Thanks for the answer Neil! The man page for mdadm does not mention it because it is not supported. It doesn't actually even mention the possibility to create a RAID-10 array (without creating RAID-0 on top of RAID-1 pairs), yet from the info I found, a lot of people have been using it for quite a while. Almost as if it was a complete secret ;) As for the RAID-10 growing / layout change - I'd absolutely love to see that implemented in the (hopefully near) future. IMHO, RAID-10 is becoming very popular because of the falling hard drive prices. Tomas - Original Message - From: Neil Brown [EMAIL PROTECTED] To: Tomas France [EMAIL PROTECTED] Cc: linux-raid@vger.kernel.org Sent: Monday, July 30, 2007 10:48 AM Subject: Re: Is it possible to grow a RAID-10 array with mdadm? On Sunday July 29, [EMAIL PROTECTED] wrote: Hi everyone, Is it possible to add drives to an active RAID-10 array, using the grow switch with mdadm, just like it is possible with a RAID-5 array? Or perhaps there is another way? I have been looking for this information for a long time but have been unable to find it anywhere. The man page for mdadm does not mention RAID-10 at all so that didn't help either. Has anyone tried it? The man page for mdadm does not mention it because it is not supported. There are several reshape options that I would like to implement including - raid5 - raid6 - shrinking raid4/5/6 - raid0 - raid5 - changing chunksize/layout of raid4/5/6 - raid10 growing and layout change unfortunately I haven't yet found/made the time. Patches are always welcome :-) NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is it possible to grow a RAID-10 array with mdadm?
On Monday July 30, [EMAIL PROTECTED] wrote: Thanks for the answer Neil! The man page for mdadm does not mention it because it is not supported. It doesn't actually even mention the possibility to create a RAID-10 array (without creating RAID-0 on top of RAID-1 pairs), yet from the info I found, a lot of people have been using it for quite a while. Almost as if it was a complete secret ;) As for the RAID-10 growing / layout change - I'd absolutely love to see that implemented in the (hopefully near) future. IMHO, RAID-10 is becoming very popular because of the falling hard drive prices. What version of mdadm do you have installed (the bottom of the man page will tell you). My v2.6.2 manpage mentions raid10 5 times, and man md mentions it 9 times. If you have a recent mdadm and there was some particular place in the man page were you were looking and didn't find raid10, please let me know and I will try to improve that part of the documentation. Thanks, NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
CONFIG: Software RAID 5 (400GB x 6): Default mkfs parameters for all filesystems. Kernel was 2.6.21 or 2.6.22, did these awhile ago. Hardware was SATA with PCI-e only, nothing on the PCI bus. ZFS was userspace+fuse of course. Reiser was V3. EXT4 was created using the recommended options on its project page. RAW: ext2,7760M,56728,96.,180505,51,85484,17.,50946.7,80.,235541,21.,373.667,0,16:10:16/64,2354,27,0,0,8455.67,14.6667,2211.67,26.,0,0,9724,22. ext3,7760M,52702.7,94.,165005,60,82294.7,20.6667,52664,83.6667,258788,33.,335.8,0,16:10:16/64,858.333,10.6667,10250.3,28.6667,4084,15,897,12.6667,4024.33,12.,2754,11. ext4,7760M,53129.7,95,164515,59.,101678,31.6667,62194.3,98.6667,266716,22.,405.767,0,16:10:16/64,1963.67,23.6667,0,0,20859,73.6667,1731,21.,9022,23.6667,16410,65.6667 jfs,7760M,54606,92,191997,52,112764,33.6667,63585.3,99,274921,22.,383.8,0,16:10:16/64,344,1,0,0,539.667,0,297.667,1,0,0,340,0 reiserfs,7760M,51056.7,96,180607,67,106907,38.,61231.3,97.6667,275339,29.,441.167,0,16:10:16/64,2516,60.6667,19174.3,60.6667,8194.33,54.,2011,42.6667,6963.67,19.6667,9168.33,68.6667 xfs,7760M,52985.7,93,158342,45,79682,14,60547.3,98,239101,20.,359.667,0,16:10:16/64,415,4,0,0,1774.67,10.6667,454,4.7,14526.3,40,1572,12.6667 zfs,7760M,25601,43.,32198.7,4,13266.3,2,44145.3,68.6667,129278,9,245.167,0,16:10:16/64,218.333,2,2698.33,11.6667,7434.67,14.,244,2,2191.33,11.6667,5613.33,13. HTML http://home.comcast.net/~jpiszcz/benchmark/allfs.html THOUGHTS Overall JFS seems the fastest but reviewing the mailing list for JFS it seems like there a lot of problems, especially when people who use JFS 1 year, their speed goes to 5 MiB/s over time and the defragfs tool has been removed(?) from the source/Makefile and on Google it says not to use it due to corruption. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
[trimmed all but linux-raid from the cc] On 7/30/07, Justin Piszcz [EMAIL PROTECTED] wrote: CONFIG: Software RAID 5 (400GB x 6): Default mkfs parameters for all filesystems. Kernel was 2.6.21 or 2.6.22, did these awhile ago. Can you give 2.6.22.1-iop1 a try to see what affect it has on sequential write performance? Download: http://downloads.sourceforge.net/xscaleiop/patches-2.6.22.1-iop1-x86fix.tar.gz Unpack into your 2.6.22.1 source tree. Install the x86 series file cp patches/series.x86 patches/series. Apply the series with quilt quilt push -a. I recommend trying the default chunk size and default stripe_cache_size as my tests have shown improvement without needing to perform any tuning. Thanks, Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Homehost suddenly changed on some components
For the record: After reading in the archives about similar problems, which were probably caused by something else but still close enough, I recreated the array with the exact same parameters from the superblock and one missing disk. mdadm -C /dev/md0 -l 5 -n 10 -c 64 -p ls /dev/sdb1 /dev/sdd1 /dev/ sde1 /dev/hde1 /dev/hdb1 /dev/hdf1 /dev/hdh1 /dev/hdg1 /dev/sdc1 missing Seems to have done the trick, fsck is working right now. Funny things seem to happen to the superblocks more often than I thought. Recreating with one missing disk appears more like a hack than a solution to me. Maybe mdadm should have some kind of explicit superblock manipulation, like copying from other components or importing/exporting from/to a file, so such problems can be solved in a safe way? Just a quick thought. :) -- Regards, Max Amanshauser - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
Justin Piszcz wrote: CONFIG: Software RAID 5 (400GB x 6): Default mkfs parameters for all filesystems. Kernel was 2.6.21 or 2.6.22, did these awhile ago. Hardware was SATA with PCI-e only, nothing on the PCI bus. ZFS was userspace+fuse of course. Wow! Userspace and still that efficient. Reiser was V3. EXT4 was created using the recommended options on its project page. RAW: ext2,7760M,56728,96.,180505,51,85484,17.,50946.7,80.,235541,21 .,373.667,0,16:10:16/64,2354,27,0,0,8455.67,14.6667,2211.67,26. ,0,0,9724,22. ext3,7760M,52702.7,94.,165005,60,82294.7,20.6667,52664,83.6667,258788, 33.,335.8,0,16:10:16/64,858.333,10.6667,10250.3,28.6667,4084,15,897 ,12.6667,4024.33,12.,2754,11. ext4,7760M,53129.7,95,164515,59.,101678,31.6667,62194.3,98.6667,266716 ,22.,405.767,0,16:10:16/64,1963.67,23.6667,0,0,20859,73.6667,1731,2 1.,9022,23.6667,16410,65.6667 jfs,7760M,54606,92,191997,52,112764,33.6667,63585.3,99,274921,22.,383. 8,0,16:10:16/64,344,1,0,0,539.667,0,297.667,1,0,0,340,0 reiserfs,7760M,51056.7,96,180607,67,106907,38.,61231.3,97.6667,275339, 29.,441.167,0,16:10:16/64,2516,60.6667,19174.3,60.6667,8194.33,54.3 333,2011,42.6667,6963.67,19.6667,9168.33,68.6667 xfs,7760M,52985.7,93,158342,45,79682,14,60547.3,98,239101,20.,359.667, 0,16:10:16/64,415,4,0,0,1774.67,10.6667,454,4.7,14526.3,40,1572,12. 6667 zfs,7760M, Dissecting some of these numbers: speed %cpu 25601,43., 32198.7,4, 13266.3, 2, 44145.3,68.6667, 129278,9, 245.167,0, 16:10:16/64, speed %cpu 218.333,2, 2698.33,11.6667, 7434.67,14., 244,2, 2191.33,11.6667, 5613.33,13. Extrapolating these %cpu number makes ZFS the fastest. Are you sure these numbers are correct? Thanks! -- Al - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
Extrapolating these %cpu number makes ZFS the fastest. Are you sure these numbers are correct? Note, that %cpu numbers for fuse filesystems are inherently skewed, because the CPU usage of the filesystem process itself is not taken into account. So the numbers are not all that good, but according to the zfs-fuse author it hasn't been optimized yet, so they may improve. Miklos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
On Mon, 2007-07-30 at 10:29 -0400, Justin Piszcz wrote: Overall JFS seems the fastest but reviewing the mailing list for JFS it seems like there a lot of problems, especially when people who use JFS 1 year, their speed goes to 5 MiB/s over time and the defragfs tool has been removed(?) from the source/Makefile and on Google it says not to use it due to corruption. The defragfs tool was an unported holdover from OS/2, which is why it was removed. There never was a working Linux version. I have some ideas to improve jfs allocation to avoid fragmentation problems, but jfs isn't my full-time job anymore, so I can't promise anything. I'm not sure about the corruption claims. I'd like to hear some specifics on that. Anyway, for enterprise use, I couldn't recommend jfs, since there is no full-time maintainer. Thanks, Shaggy -- David Kleikamp IBM Linux Technology Center - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
On Mon, 30 Jul 2007, Miklos Szeredi wrote: Extrapolating these %cpu number makes ZFS the fastest. Are you sure these numbers are correct? Note, that %cpu numbers for fuse filesystems are inherently skewed, because the CPU usage of the filesystem process itself is not taken into account. So the numbers are not all that good, but according to the zfs-fuse author it hasn't been optimized yet, so they may improve. Miklos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html This was performed on an E6300, 1 core was ZFS/FUSE (or quite a bit of it anyway) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bonnie++ benchmarks for ext2,ext3,ext4,jfs,reiserfs,xfs,zfs on software raid 5
On Mon, 30 Jul 2007, Dan Williams wrote: [trimmed all but linux-raid from the cc] On 7/30/07, Justin Piszcz [EMAIL PROTECTED] wrote: CONFIG: Software RAID 5 (400GB x 6): Default mkfs parameters for all filesystems. Kernel was 2.6.21 or 2.6.22, did these awhile ago. Can you give 2.6.22.1-iop1 a try to see what affect it has on sequential write performance? Download: http://downloads.sourceforge.net/xscaleiop/patches-2.6.22.1-iop1-x86fix.tar.gz Unpack into your 2.6.22.1 source tree. Install the x86 series file cp patches/series.x86 patches/series. Apply the series with quilt quilt push -a. I recommend trying the default chunk size and default stripe_cache_size as my tests have shown improvement without needing to perform any tuning. Thanks, Dan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Will keep in mind for next test, but like I said these were from a while ago. Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHSET/RFC] Refactor block layer to improve support for stacked devices.
Hi, I have just sent a patch-set to linux-kernel that touches quite a number of block device drives, with particular relevance to md and dm. Rather than fill lots of peoples mailboxes multiple times (35 patches in the set), I only sent the full set to linux-kernel, and am just sending this single notification to other relevant lists. If you want to look at the patch set (and please do) and are not subscribe to linux-kernel, you can view it here: http://lkml.org/lkml/2007/7/30/468 or ask and I'll send you all 35 patches. Below is the introductory email Thanks, NeilBrown From: NeilBrown [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Tue, 31 Jul 2007 12:15:45 +1000 The following 35(!) patches achieve a refactoring of some parts of the block layer to provide better support for stacked devices. The core issue is that of letting bio_add_page know the limitation that the device imposes so that it doesn't create a bio that is too large. For a unstacked disk device (e.g. scsi), bio_add_page can access max_nr_sectors and max_nr_segments and some other details to know how segments should be counted, and does the appropriate checks (this is a simplification, get is close enough for this discussion). For stacked devices (dm, md etc) bio_add_page can also call into the driver via merge_bvec_fn to find out if a page can be added to a bio. This mostly works for a simple stack (e.g. md on scsi) but breaks down with more complicated stacks (dm on md on scsi) as the recusive calls to merge_bvec_fn that are required are difficult to get right, and don't provide any guarantees in the face of array reconfiguration anyway. dm and md both take the approach of if the never level down defines merge_bvec_fn, then set max_sectors to PAGE_SIZE/512 and live with small requests. So this patchset introduces a new approach. bio_add_page is allowed to create bios as big as it likes, and each layer is responsible for splitting that bio up as required. For intermediate levels like raid0, a number of new bios might be created which refer to parts of the original, including parts of the bi_io_vec. For the bottom level driver (__make_request), each struct request can refer to just part of a bio, so a bio can be effectively split among several requests (a request can still reference multiple small bios, and can concievable list parts of large bios and some small bios as well, though the merging required to achieve this isn't implemented yet - that patch set is big enough as it is). This requires that the bi_io_vec become immutable, and that certain parts of the bio become immutable. To achieve this, we introduce fields into the bio so that it can point to just part of the bi_io_vec (an offset and a size) and introduce similar fields into 'struct request' to refer to only part of a bio list. I am keen to receive both review and testing. I have tested it on SATA drives with a range of md configurations, but haven't tested dm, or ide-floppy, or various other bits that needed to be changed. Probably the changes that are mostly likely to raise eyebrows involve the code to iterate over the segments in a bio or in a 'struct request', so I'll give a bit more detail about them here. Previously these (bio_for_each_segment, rq_for_each_bio) were simple macros that provided pointers into bi_io_vec. As the actual segments that a request might need to handle may no longer be explicitly in bi_io_vec (e.g. an offset might need to be added, or a size restriction might need to be imposed) this is no longer possible. Instead, these functions (now rq_for_each_segment and bio_for_each_segment) fill in a 'struct bio_vec' with appropriate values. e.g. struct bio_vec bvec; struct bio_iterator i; bio_for_each_segment(bvec, bio, i) use bvec.bv_page, bvec.bi_offset, bvec.bv_len This might seem like data is being copied around a bit more, but it should all be in L1 cache and could conceivable be optimised into registers by the compiler, so I don't believe this is a big problem (no, I haven't figured a good way to test it). To achieve this, the for_each macros are now somewhat more complex. For example, rq_for_each_segment is: #define bio_for_each_segment_offset(bv, bio, _i, offs, _size) \ for (_i.i = 0, _i.offset = (bio)-bi_offset + offs, \ _i.size = min_t(int, _size, (bio)-bi_size - offs);\ _i.i (bio)-bi_vcnt _i.size 0; \ _i.i++)\ if (bv = *bio_iovec_idx((bio), _i.i), \ bv.bv_offset += _i.offset, \ bv.bv_len = _i.offset \ ? (_i.offset -= bv.bv_len, 0) \ : (bv.bv_len -= _i.offset, \ _i.offset = 0,
[patch 07/26] md: Fix two raid10 bugs.
-stable review patch. If anyone has any objections, please let us know. -- 1/ When resyncing a degraded raid10 which has more than 2 copies of each block, garbage can get synced on top of good data. 2/ We round the wrong way in part of the device size calculation, which can cause confusion. Signed-off-by: Neil Brown [EMAIL PROTECTED] Signed-off-by: Chris Wright [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- drivers/md/raid10.c |6 ++ 1 file changed, 6 insertions(+) diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c --- linux-2.6.21.6.orig/drivers/md/raid10.c +++ linux-2.6.21.6/drivers/md/raid10.c @@ -1867,6 +1867,7 @@ static sector_t sync_request(mddev_t *md int d = r10_bio-devs[i].devnum; bio = r10_bio-devs[i].bio; bio-bi_end_io = NULL; + clear_bit(BIO_UPTODATE, bio-bi_flags); if (conf-mirrors[d].rdev == NULL || test_bit(Faulty, conf-mirrors[d].rdev-flags)) continue; @@ -2037,6 +2038,11 @@ static int run(mddev_t *mddev) /* 'size' is now the number of chunks in the array */ /* calculate used chunks per device in 'stride' */ stride = size * conf-copies; + + /* We need to round up when dividing by raid_disks to +* get the stride size. +*/ + stride += conf-raid_disks - 1; sector_div(stride, conf-raid_disks); mddev-size = stride (conf-chunk_shift-1); -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 08/26] md: Fix bug in error handling during raid1 repair.
-stable review patch. If anyone has any objections, please let us know. -- From: Mike Accetta [EMAIL PROTECTED] If raid1/repair (which reads all block and fixes any differences it finds) hits a read error, it doesn't reset the bio for writing before writing correct data back, so the read error isn't fixed, and the device probably gets a zero-length write which it might complain about. Signed-off-by: Neil Brown [EMAIL PROTECTED] Signed-off-by: Chris Wright [EMAIL PROTECTED] Signed-off-by: Greg Kroah-Hartman [EMAIL PROTECTED] --- drivers/md/raid1.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c --- linux-2.6.21.6.orig/drivers/md/raid1.c +++ linux-2.6.21.6/drivers/md/raid1.c @@ -1240,17 +1240,24 @@ static void sync_request_write(mddev_t * } r1_bio-read_disk = primary; for (i=0; imddev-raid_disks; i++) - if (r1_bio-bios[i]-bi_end_io == end_sync_read - test_bit(BIO_UPTODATE, r1_bio-bios[i]-bi_flags)) { + if (r1_bio-bios[i]-bi_end_io == end_sync_read) { int j; int vcnt = r1_bio-sectors (PAGE_SHIFT- 9); struct bio *pbio = r1_bio-bios[primary]; struct bio *sbio = r1_bio-bios[i]; - for (j = vcnt; j-- ; ) - if (memcmp(page_address(pbio-bi_io_vec[j].bv_page), - page_address(sbio-bi_io_vec[j].bv_page), - PAGE_SIZE)) - break; + + if (test_bit(BIO_UPTODATE, sbio-bi_flags)) { + for (j = vcnt; j-- ; ) { + struct page *p, *s; + p = pbio-bi_io_vec[j].bv_page; + s = sbio-bi_io_vec[j].bv_page; + if (memcmp(page_address(p), + page_address(s), + PAGE_SIZE)) + break; + } + } else + j = 0; if (j = 0) mddev-resync_mismatches += r1_bio-sectors; if (j 0 || test_bit(MD_RECOVERY_CHECK, mddev-recovery)) { -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html