Re: RAID10 far (f2) read throughput on random and sequential / read-ahead
I made a reference to your work in the wiki howto on performance. Thanks! Keld On Fri, Feb 22, 2008 at 04:14:05AM +, Nat Makarevitch wrote: 'md' performs wonderfully. Thanks to every contributor! I pitted it against a 3ware 9650 and 'md' won on nearly every account (albeit on RAID5 for sequential I/O the 3ware is a distant winner): http://www.makarevitch.org/rant/raid/#3wmd On RAID10 f2 a small read-ahead reduces the throughput on sequential read, but even a low value (768 for the whole 'md' block device, 0 for the underlying spindles) enables very good sequential read performance (300 MB/s on 6 low-end Hitachi 500 GB spindles). What baffles me is that, on a 1.4TB array served by a box having 12 GB RAM (low cache-hit ratio), the random access performance remains stable and high (450 IOPS with 48 threads, 20% writes - 10% fsync'ed), even with a fairly high read-ahead (16k). How comes?! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suns raid-z / zfs
On Mon, Feb 18, 2008 at 09:51:15PM +1100, Neil Brown wrote: On Monday February 18, [EMAIL PROTECTED] wrote: On Mon, Feb 18, 2008 at 03:07:44PM +1100, Neil Brown wrote: On Sunday February 17, [EMAIL PROTECTED] wrote: Hi It seems like a good way to avoid the performance problems of raid-5 /raid-6 I think there are better ways. Interesting! What do you have in mind? A Log Structured Filesystem always does large contiguous writes. Aligning these to the raid5 stripes wouldn't be too hard and then you would never have to do any pre-reading. and what are the problems with zfs? Recovery after a failed drive would not be an easy operation, and I cannot imagine it being even close to the raw speed of the device. I thought this was a problem with most raid types, while reconstructioning, performance is quite slow. And as there has been some damage, this is expected. And there probebly is no much ado about it. Or is there? Are there any RAID types that performs reasonably well given that one disk is under repair? The performance could be cruical for some applications. One could think of clever arrangements so that say two disks could go down and the rest of the array with 10-20 drives could still function reasonably well, even under the reconstruction. As far as I can tell from the code, the reconstruction itself is not impeding normal performance much, as normal operation bars reconstuction operations. Hmm, my understanding would then be, for both random reads and writes that performance in typical raids would only be reduced by the IO bandwidth of the failing disks. For sequential R/W performance for raid10,f would be hurt, downgrading its performance to random IO for the drives involved. Raid5/6 would be hurt much for reading, as all drives need to be read for giving correct information during reconstruction. So it looks like, if your performance is important under a reconstruction, then you should avoid raid5/6 and use the mirrored raid types. Given you have a big operation, with a load balance of a lot of random reading and writing, it does not matter much which mirrored raid type you would choose, as they all perform about equal for random IO, even when reconstructing. Is that correct advice? But does it stripe? One could think that rewriting stripes other places would damage the striping effects. I'm not sure what you mean exactly. But I suspect your concerns here are unjustified. More precisely. I understand that zfs always write the data anew. That would mean at other blocks on the partitions, for the logical blocks of the file in question. So the blocks on the partitions will not be adjacant. And striping will not be possible, generally. The important part of striping is that a write is spread out over multiple devices, isn't it. If ZFS can choose where to put each block that it writes, it can easily choose to write a series of blocks to a collection of different devices, thus getting the major benefit of striping. I see 2 major benefits of striping: one is that many drives are involved and the other is that the stripes are allocated adjacant, so that io on one drive can just proceed to the next physical blocks when one stripe has been processed. Dependent on the size of the IO operations involved, first one or more disks in a stripe is processed, and then the following stripes are processed. ZFS misses the second part of the optimization, In think. Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
suns raid-z / zfs
Hi any opinions on suns zfs/raid-z? It seems like a good way to avoid the performance problems of raid-5 /raid-6 But does it stripe? One could think that rewriting stripes other places would damage the striping effects. Or is the performance only meant to be good for random read/write? Can the code be lifted to Linux? I understand that it is already in freebsd. Does Suns licence prevent this? And could something like this be built into existing file systems like ext3 and xfs? They could have a multipartition layer in their code, and then the heuristics to optimize block access could also apply to stripe access. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
patch for raid10,f1 to operate like raid0
This patch changes the disk to be read for layout far 1 to always be the disk with the lowest block address. Thus the chunks to be read will always be (for a fully functioning array) from the first band of stripes, and the raid will then work as a raid0 consisting of the first band of stripes. Some advantages: The fastest part which is the outer sectors of the disks involved will be used. The outer blocks of a disk may be as much as 100 % faster than the inner blocks. Average seek time will be smaller, as seeks will always be confined to the first part of the disks. Mixed disks with different performance characteristics will work better, as they will work as raid0, the sequential read rate will be number of disks involved times the IO rate of the slowest disk. If a disk is malfunctioning, the first disk which is working, and has the lowest block address for the logical block will be used. Signed-off-by: Keld Simonsen [EMAIL PROTECTED] --- raid10.c2008-02-12 00:50:59.0 +0100 +++ raid10-ks.c 2008-02-12 00:51:09.0 +0100 @@ -537,7 +537,7 @@ current_distance = abs(r10_bio-devs[slot].addr - conf-mirrors[disk].head_position); - /* Find the disk whose head is closest */ + /* Find the disk whose head is closest, + or for far 1 the closest to partition beginning */ for (nslot = slot; nslot conf-copies; nslot++) { int ndisk = r10_bio-devs[nslot].devnum; @@ -557,7 +557,11 @@ slot = nslot; break; } - new_distance = abs(r10_bio-devs[nslot].addr - + +/* for far 1 always use the lowest address */ + if (conf-far_copies 1) + new_distance = r10_bio-devs[nslot].addr; + else new_distance = abs(r10_bio-devs[nslot].addr - conf-mirrors[ndisk].head_position); if (new_distance current_distance) { current_distance = new_distance; - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
my io testing scripts
Here are my testing scripts used in the performance howto: http://linux-raid.osdl.org/index.php/Home_grown_testing_methods =Hard disk performance scripts= Here are the scripts that I used for my performance measuring. Use at your own risk. They destroy the contents of the partitions involved. The /dev/md raid needs to be stopped before initiating the test. Copyright Keld Simonsen, [EMAIL PROTECTED] 2008. Licensed under the GPL. iotest: #!/bin/sh # invoked by # iotest mdadm -R -C /dev/md1 --chunk=256 -l 10 -n 2 -p f2 /dev/md1 /mnt/md1 ext3 /dev/hdb5 /dev/hdd5 echo \n $1 $5 \n /tmp/results echo $1 $5 $1 $5 mkfs -t $4 $2 mkdir $3 mount $2 $3 cd $3 echo \nmakefiles\n /tmp/results mkfiles 200 echo \n remakefiles \n /tmp/results mkfiles 200 echo \n catall \n /tmp/results cat * /dev/null echo \n catnull \n /tmp/results catnull cd umount $2 mdadm -S $2 echo \n finish $1 $5 \n /tmp/results Be careful with this script, and remember to change the ordinary test to only one partition iorun: #!/bin/sh # set up ram disk DISKS=/dev/sda2 /dev/sdb2 iostat -k 10 /tmp/results iotest /dev/sda2 /mnt/sda2 ext3 iotest mdadm -C /dev/md1 --chunk=256 -R -l 0 -n 2 /dev/md1 /mnt/md1 ext3 $DISKS iotest mdadm -C /dev/md1 --chunk=256 -R -l 1 -n 2 /dev/md1 /mnt/md1 ext3 $DISKS iotest mdadm -C /dev/md1 --chunk=256 -R -l 10 -n 2 /dev/md1 /mnt/md1 ext3 $DISKS iotest mdadm -C /dev/md1 --chunk=256 -R -l 10 -n 2 -p f2 /dev/md1 /mnt/md1 ext3 $DISKS # iotest mdadm -C /dev/md1 --chunk=256 -R -l 10 -n 2 -p o2 /dev/md1 /mnt/md1 ext3 $DISKS mkfiles: #!/bin/sh for (( i = 1; i $1 ; i++ )) ; do dd if=/dev/hda1 of=$i bs=1MB count=40 ; done for (( i = 1; i $1 ; i++ )) ; do dd if=/dev/hda1 of=$i bs=1MB count=40 ; done catnull: #!/bin/tcsh foreach i ( * ) cat $i /dev/null end wait - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
howto on performance
I have put up a new howto text on performance: http://linux-raid.osdl.org/index.php/Performance#Performance_of_raids_with_2_disks Enjoy! Keld =Performance of raids with 2 disks= I have made some testing of performance of different types of RAIDs, with 2 disks involved. I have used my own home grown testing methods, which are quite simple, to test sequential and random reading and writing of 200 files of 40 MB. The tests were meant to see what performance I could get out of a system mostly oriented towards file serving, such as a mirror site. My configuration was 1800 MHz AMD Sempron(tm) Processor 3100+ 1500 MB RAM 2 x Hitachi Ultrastar SCSI-II 1 TB. Linux version 2.6.12-26mdk Figures are in MB/s, and the file system was ext3. Times were measured with iostat, and an estimate for steady performance was taken. The times varied quite a lot over the different 10 second intervals, for example the estimate 155 MB/s ranged from 135 MB/s to 163 MB/s. I then looked at the avearge over the period when a test was running in full scale (all processes started, and none stopped). RAID type sequential read random readsequential write random write Ordinary disk 82 34 67 56 RAID0 155 80 97 80 RAID1 80 35 72 55 RAID10 79 56 69 48 RAID10,f2 150 79 70 55 Random read for RAID1 and RAID10 were quite unbalanced, almost only coming out of one of the disks. The results are quite as expected: RAID0 and RAID10,f2 reads are double speed compared to ordinary file system for sequential reads (155 vs 82) and more than double for random reads (80 vs 35). Writes (both sequential and random) are roughly the same for ordinary disk, RAID1, RAID10 and RAID10,f2, around 70 MB/s for sequential, and 55 MB/s for random. Sequential reads are about the same (80 MB/s) for ordinary partition, RAID1 and RAID10. Rndom reads for ordinary partition and RAID1 is about the same (35 MB/s) and about 50 % higher for RAID10. I am puzzled why RAID10 is faster than RAID1 here. All in all RAID10,f2 is the fastest mirrored RAID for both sequential and random reading for this test, while it is about equal with the other mirrored RAIDs when writing. My kernel did not allow me to test RAID10,o2 as this is only supported from kernel 2.6.18. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: howto and faq
On Sun, Feb 10, 2008 at 10:05:13AM +, David Greaves wrote: Keld Jørn Simonsen wrote: The list description at http://vger.kernel.org/vger-lists.html#linux-raid does list af FAQ, http://www.linuxdoc.org/FAQ/ Yes, that should be amended. Drop them a line about the FAQ too I will. So our FAQ info is pretty out of date. I think it would be nice to have a wiki like we have for the Howto. This would mean that we have much better means to let new people make their mark, and avoid the problem that we have today with really outdated info. There seems to be no point in having separate wikis for the FAQ and HOWTO elements of documentation. Especially since a lot of FAQs are How do I... by definition the answer is a HOWTO. So can we put up a wiki somewhere for this, or should we just extend the wiki howto pages to also include a faq section? So just extend the existing wiki. OK, so let's have a combined howto and faq. I would then like that to be reflected in the main page. I would rather that this be called Howto and FAQ - Linux raid than Main Page - Linux Raid. Is that possible? And then, how do we structure the pages? I think we need a new section for the FAQ. And then I would like a clearer statement on the relation between the linux-raid mailing list and the pages, right in the top of the main page. I set the wiki up at osdl to ensure that if a bus hit me then Neil or others would have a rational and responsive organisation to go to to change ownership. I've been writing to some of the other FAQ/Doc organisations sporadically for over a year now and had no response from any of them. It's a very poor aspect of OSS... Looks like a good move. I have had a look at other search engines, yahoo and msn. Our pages do show up within the 10 first hits for linux raid. So that is not that bad. Still, Google has the http://linux-raid.osdl.org/ page as number 127. That is very bad. Maybe something about it being referenced from wikipedia? Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: howto and faq
On Sun, Feb 10, 2008 at 06:21:08PM +, David Greaves wrote: Keld Jørn Simonsen wrote: I would then like that to be reflected in the main page. I would rather that this be called Howto and FAQ - Linux raid than Main Page - Linux Raid. Is that possible? Just like C has a main() wiki's have a Main Page :) I guess it could be changed but I think it involves editing the Mediawiki config - maybe next time I'm in there... OK, good. And then, how do we structure the pages? I think we need a new section for the FAQ. By all means create an FAQ page and link to answers or other relevant sections of the wiki. Bear in mind that this is a reference work and whilst it may contain tutorials the idea is that it contains (reasonably) authoritative information about the linux raid subsystem (linking to the source, kernel docs or man pages if that's more appropriate). Yes, I will be conservative and robust in what I write there. And then I would like a clearer statement on the relation between the linux-raid mailing list and the pages, right in the top of the main page. The relationship is loose - the statement as it stands describes the current state of affairs. If Neil feels that he could or would like to help the case by declaring a more official relationship then that's his call. To be fair I work on these pages on and off as the mood takes me :) if I was Neil I'd be keeping an eye on it and waiting for the right level of community involvement. OK, I will only state something like the usual FAQ thing: please consult the FAQ before submitting questions to the list. I have had a look at other search engines, yahoo and msn. Our pages do show up within the 10 first hits for linux raid. So that is not that bad. Still, Google has the http://linux-raid.osdl.org/ page as number 127. That is very bad. Maybe something about it being referenced from wikipedia? I'm not an expert at gaming the search engines - more than happy to do rational things like linking from Wikipedia and other reference sites. I am sad that I've had such a poor response from the other linux documentation sites... maybe a Slashdot article not so much about doc-rot but about the difficulty of combating doc-rot would help... Maybe they'd take more notice if I said the linux raid subsystem maintainer says... - dunno. I think we should just contact some more people... And then do some linking ourselves. Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: two writing algorithms
On Fri, Feb 08, 2008 at 12:51:39PM +1100, Neil Brown wrote: On Friday February 8, [EMAIL PROTECTED] wrote: On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote: On Thursday February 7, [EMAIL PROTECTED] wrote: So I hereby give the idea for inspiration to kernel hackers. and I hereby invite you to read the code ;-) I did some reading. Is there somewhere a description of it, especially the raid code, or are the comments and the code the best documentation? No. If a description was written (and various people have tried to describe various parts) it would be out of date within a few months :-( OK, I was under the impression that some of the code did not change much. Eg. you said that there had not been any work on optimizing raid10 for performance since the 2.6.12 kernel I was using. And then at least the raid5 code, the last copyright notice right in the top is Copyright (C) 2002, 2003 H. Peter Anvin. That is 5 years ago. And your name is not on it. So I did not look that much into that code, thinking nothing had been done there for ages. Maybe you could add your name on it, that would only be fair. The same comment goes for other modules (for which it is relevant). Look for READ_MODIFY_WRITE and RECONSTRUCT_WRITE no. That only applied to raid6 code now.. Look instead for the 'rcw' and 'rmw' counters, and then at 'handle_write_operations5' which does different things based on the 'rcw' variable. It used to be a lot clearer before we implemented xor-offload. The xor-offload stuff is good, but it does make the code more complex. OK, I think it is fairly well documented here, I can at least follow the logic, and then I think it is a good approach to have the flow description/strategy included directly in the code. Given there are many changes to the code, different files for code and description could easily mix up the alignment of code and documentation badly. Do you say that this is already implemented? Yes. That is very good! Do you konw if other implementations of this, eg. commercial controller code, have this facility? If not, we could list this as an advantage of linux raid. Anyway it would be implicit in performance documentation. I do plan to write up something on performance, soonish. The howto is hopelessly outdated. IMHO such code should make the performance of raid5 random writes not that bad. Better than the reputation that raid5 is hopelessly slow for database writing. I think raid5 would be less than double as slow as raid1 for random writing. Well, I do have a hack in mind, on the raid10,f2. I need to investigate some more, and possibly test out what really happens. But maybe the code already does what I want it to. You are possibly the one that knows the code best, so maybe you can tell me if raid10,f2 always does its reading in the first part of the disks? Yes, I know the code best. No, raid10,f2 doesn't always use the first part of the disk. Getting it to do that would be a fairly small change in 'read_balance' in md/raid10.c. I'm not at all convinced that the read balancing code in raid10 (or raid1) really does the best thing. So any improvements - backed up with broad testing - would be most welcome. I think I know where to do my proposed changes, and how it could be done. So maybe in a not too distant future I will have done my first kernel hack! Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
On Thu, Feb 07, 2008 at 09:05:04AM +0100, Luca Berra wrote: On Wed, Feb 06, 2008 at 04:45:39PM +0100, Keld Jørn Simonsen wrote: On Wed, Feb 06, 2008 at 10:05:58AM +0100, Luca Berra wrote: On Sat, Feb 02, 2008 at 08:41:31PM +0100, Keld Jørn Simonsen wrote: Make each of the disks bootable by lilo: lilo -b /dev/sda /etc/lilo.conf1 lilo -b /dev/sdb /etc/lilo.conf2 There should be no need for that. to achieve the above effect with lilo you use raid-extra-boot=mbr-only in lilo.conf Make each of the disks bootable by grub install grub with the command grub-install /dev/md0 I have already changed the text on the wiki. Still I am not convinced it is the best advice that is described. lilo -b /dev/md0 (without a raid-extra-boot line in lilo.conf) will install lilo on the boot sector of the partitions containing /dev/md0 (and it will break with 1.1 sb) I think 1.1 Superblocks will break all boots with lilo and grub, but 1.1 superblocks are not standard in current distributions. When would 1.1 superblocks be a problem, for new users of raid? for grub, do you have any doubt about the grub-install script not working correctly? No, I think the grub description is OK. I only meant the lilo description. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recommendations for stripe/chunk size
On Thu, Feb 07, 2008 at 06:40:12AM +0100, Iustin Pop wrote: On Thu, Feb 07, 2008 at 01:31:16AM +0100, Keld Jørn Simonsen wrote: Anyway, why does a SATA-II drive not deliver something like 300 MB/s? Wait, are you talking about a *single* drive? Yes, I was talking about a single drive. In that case, it seems you are confusing the interface speed (300MB/s) with the mechanical read speed (80MB/s). I thought the 300 MB/S was the transfer rate between the disk and the controllers memory in its buffers, but you indicate that this is the speed between the controller's buffers and main RAM. I am, as Neil, amazed by the speeds that we get on current hardware, but still I would like to see if we could use the hardware better. Asyncroneous IO could be a way forward. I have written some mainframe utilities where asyncroneous IO was the key to the performance, so I thought that it could also become handy in the Linux kernel. If about 80 MB/s is the maximum we can get out of a current SATA-II 7200 rpm drive, then I think there is not much to be gained from asyncroneous IO. If you are asking why is a single drive limited to 80 MB/s, I guess it's a problem of mechanics. Even with NCQ or big readahead settings, ~80-~100 MB/s is the highest I've seen on 7200 RPM drives. And yes, there is no wait until the CPU processes the current data until the drive reads the next data; drives have a builtin read-ahead mechanism. Honestly, I have 10x as many problems with the low random I/O throughput rather than with the (high, IMHO) sequential I/O speed. I agree that random IO is the main factor on most server installations. But on workstations the sequentioal IO is also important, as the only user is sometimes waiting for the computer to respond. And then I think that booting can benefit from faster sequential IO. And not to forget, I think it is fun to make my hardware run faster! best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
howto and faq
Hi I am trying to get some order to linux raid info. I think we should have a faq and a howto for the linux-raid list. The list description at http://vger.kernel.org/vger-lists.html#linux-raid does list af FAQ, http://www.linuxdoc.org/FAQ/ I cannot read it just now - the server www.linuxdoc.org does not respond. I then tried the google archive - which had no info, and then the internet archive, which for the latest entry on this had some notes on Debian and GFDL - quite irrelevant. there are other FAQs that claim to be the FAQ for linux-raid. One is http://www.faqs.org/contrib/linux-raid/ which is quite extensive, but from 2003 (about 5 years old). So our FAQ info is pretty out of date. I think it would be nice to have a wiki like we have for the Howto. This would mean that we have much better means to let new people make their mark, and avoid the problem that we have today with really outdated info. So can we put up a wiki somewhere for this, or should we just extend the wiki howto pages to also include a faq section? For the howto, I have asked the VGER people to add info to our list description, that we have a wiki howto at http://linux-raid.osdl.org/ I believe that this is the fact, that this howto is our official howto. I have added a remark in the top of the text hinting that this wiki howto is the official howto of the linux-raid list, tho I did not state it as such. Hope this gives some clarity of the situation. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5: two writing algorithms
As I understand it, there are 2 valid algoritms for writing in raid5. 1. calculate the parity data by XOR'ing all data of the relevant data chunks. 2. calculate the parity data by kind of XOR-subtracting the old data to be changed, and then XOR-adding the new data. (XOR-subtract and XOR-add is actually the same). There are situations where method 1 is the fastest, and situations where method 2 is the fastest. My idea is then that the raid5 code in the kernel can calculate which method is the faster. method 1 is faster, if all data is already available. I understand that this method is employed in the current kernel. This would eg be the case with sequential writes. Method 2 is faster, if no data is available in core. It would require 2 reads and two writes, which always will be faster than n reads and 1 write, possibly except for n=2. method 2 is thus faster normally for random writes. I think that method 2 is not used in the kernel today. Mayby I am wrong, but I did have a look in the kernel code. So I hereby give the idea for inspiration to kernel hackers. Yoyr kernel hacker wannabe keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5: two writing algorithms
On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote: On Thursday February 7, [EMAIL PROTECTED] wrote: As I understand it, there are 2 valid algoritms for writing in raid5. 1. calculate the parity data by XOR'ing all data of the relevant data chunks. 2. calculate the parity data by kind of XOR-subtracting the old data to be changed, and then XOR-adding the new data. (XOR-subtract and XOR-add is actually the same). There are situations where method 1 is the fastest, and situations where method 2 is the fastest. My idea is then that the raid5 code in the kernel can calculate which method is the faster. method 1 is faster, if all data is already available. I understand that this method is employed in the current kernel. This would eg be the case with sequential writes. Method 2 is faster, if no data is available in core. It would require 2 reads and two writes, which always will be faster than n reads and 1 write, possibly except for n=2. method 2 is thus faster normally for random writes. I think that method 2 is not used in the kernel today. Mayby I am wrong, but I did have a look in the kernel code. It is very odd that you would think something about the behaviour of the kernel with actually having looked. It also seems a little arrogant to have a clever idea and assume that no one else has thought of it before. Oh well, I have to admit that I do not understand the code fully. I am not a seasoned kernel hacker, as I also indicated in my ad hoc signature. So I hereby give the idea for inspiration to kernel hackers. and I hereby invite you to read the code ;-) I did some reading. Is there somewhere a description of it, especially the raid code, or are the comments and the code the best documentation? Do you say that this is already implemented? I am sorry if you think I am mailing too much on the list. But I happen to think it is fun. And I do try to give something back. Code reading is a good first step to being a Yoyr kernel hacker wannabe ^ NeilBrown Well, I do have a hack in mind, on the raid10,f2. I need to investigate some more, and possibly test out what really happens. But maybe the code already does what I want it to. You are possibly the one that knows the code best, so maybe you can tell me if raid10,f2 always does its reading in the first part of the disks? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Purpose of Document? (was Re: draft howto on making raids for surviving a disk crash)
On Wed, Feb 06, 2008 at 08:24:37AM -0600, Moshe Yudkowsky wrote: I read through the document, and I've signed up for a Wiki account so I can edit it. One of the things I wanted to do was correct the title. I see that there are *three* different Wiki pages about how to build a system that boots from RAID. None of them are complete yet. So, what is the purpose of this page? I think the purpose is a complete description of how to use RAID to build a system that not only boots from RAID but is robust against other hazards such as file system corruption. You are right about that there are more than one wiki page addressing very related issues. I also considered whether there was a need for the new page, and discussed it with David. And yes, my idea was to make a howto on building a system that can survive a disk crash. A simple system that can also work for a workstation. In fact the main audience is possibly here. so my focus is: survive a failing disk, and keep it simple. Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
On Wed, Feb 06, 2008 at 10:05:58AM +0100, Luca Berra wrote: On Sat, Feb 02, 2008 at 08:41:31PM +0100, Keld Jørn Simonsen wrote: Make each of the disks bootable by lilo: lilo -b /dev/sda /etc/lilo.conf1 lilo -b /dev/sdb /etc/lilo.conf2 There should be no need for that. to achieve the above effect with lilo you use raid-extra-boot=mbr-only in lilo.conf Make each of the disks bootable by grub install grub with the command grub-install /dev/md0 I have already changed the text on the wiki. Still I am not convinced it is the best advice that is described. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 or raid10 for /boot
On Wed, Feb 06, 2008 at 01:52:11PM -0500, Bill Davidsen wrote: Keld Jørn Simonsen wrote: I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? I don't know about you, but my /boot goes with zero use between boots, efficiency and performance improvements strike as a distinction without a difference, while adding complexity without benefit is always a bad idea. I suggest that you avoid having a learning experience and stick with raid1. I agree with you, it was only a theoretical question. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recommendations for stripe/chunk size
On Wed, Feb 06, 2008 at 09:25:36PM +0100, Wolfgang Denk wrote: In message [EMAIL PROTECTED] you wrote: I actually think the kernel should operate with block sizes like this and not wth 4 kiB blocks. It is the readahead and the elevator algorithms that save us from randomly reading 4 kb a time. Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk. Indeed kernel page size is an important factor in such optimizations. But you have to keep in mind that this is mostly efficient for (very) large strictly sequential I/O operations only - actual file system traffic may be *very* different. We implemented the option to select kernel page sizes of 4, 16, 64 and 256 kB for some PowerPC systems (440SPe, to be precise). A nice graphics of the effect can be found here: https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf Yes, that is also what I would expect, for sequential reads. Random writes of small data blocks, kind of what is done in bug data bases, should show another picture as others also have described. If you look at a single disk, would you get improved performance with the asyncroneous IO? I am a bit puzzled about my SATA-II performance: nominally I could get 300 MB/s on SATA-II, but I only get about 80 MB/s. Why is that? I thought it was because of latency with syncroneous reads. Ie, when a chunk is read, yo need to complete the IO operation, and then issue an new one. In the meantime while the CPU is doing these calculations, te disk has spun a little, and to get the next data chunk, we need to wait for the disk to spin around to have the head positioned over the right data pace on the disk surface. Is that so? Or does the controller take care of this, reading the rest of the not-yet-requested track into a buffer, which then can be delivered next time. Modern disks often have buffers of about 8 or 16 MB. I wonder why they don't have bigger buffers. Anyway, why does a SATA-II drive not deliver something like 300 MB/s? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
On Thu, Jan 31, 2008 at 02:55:07AM +0100, Keld Jørn Simonsen wrote: On Wed, Jan 30, 2008 at 11:36:39PM +0100, Janek Kozicki wrote: Keld Jørn Simonsen said: (by the date of Wed, 30 Jan 2008 23:00:07 +0100) All the raid10's will have double time for writing, and raid5 and raid6 will also have double or triple writing times, given that you can do striped writes on the raid0. For raid5 and raid6 I think this is even worse. My take is that for raid5 when you write something, you first read the chunk data involved, then you read the parity data, then you xor-subtract the data to be changed, and you xor-add the new data, and then write the new data chunk and the new parity chunk. In total 2 reads and 2 writes. The read/writes happen on the same chunks, so latency is minimized. But in essence it is still 4 IO operations, where it is only 2 writes on raid1/raid10, that is only half the speed for writing on raid5 compared to raid1/10. On raid6 this amounts to 6 IO operations, resulting in 1/3 of the writing speed of raid1/10. I note in passing that there is no difference between xor-subtract and xor-add. Also I assume that you can calculate the parities of both raid5 and raid6 given the old parities chunks and the old and new data chunk. If you have to calculate the new parities by reading all the component data chunks this is going to be really expensive, both in IO and CPU. For a 10 drive raid5 this would involve reading 9 data chunks, and making writes 5 times as expensive as raid1/10. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
recommendations for stripe/chunk size
Hi I am looking at revising our howto. I see a number of places where a chunk size of 32 kiB is recommended, and even recommendations on maybe using sizes of 4 kiB. My own take on that is that this really hurts performance. Normal disks have a rotation speed of between 5400 (laptop) 7200 (ide/sata) and 1 (SCSI) rounds per minute, giving an average spinning time for one round of 6 to 12 ms, and average latency of half this, that is 3 to 6 ms. Then you need to add head movement which is something like 2 to 20 ms - in total average seek time 5 to 26 ms, averaging around 13-17 ms. in about 15 ms you can read on current SATA-II (300 MB/s) or ATA/133 something like between 600 to 1200 kB, actual transfer rates of 80 MB/s on SATA-II and 40 MB/s on ATA/133. So to get some bang for the buck, and transfer some data you should have something like 256/512 kiB chunks. With a transfer rate of 50 MB/s and chunk sizes of 256 kiB giving about a time of 20 ms per transaction you should be able with random reads to transfer 12 MB/s - my actual figures is about 30 MB/s which is possibly because of the elevator effect of the file system driver. With a size of 4 kb per chunk you should have a time of 15 ms per transaction, or 66 transactions per second, or a transfer rate of 250 kb/s. So 256 kb vs 4 kb speeds up the transfer by a factor of 50. I actually think the kernel should operate with block sizes like this and not wth 4 kiB blocks. It is the readahead and the elevator algorithms that save us from randomly reading 4 kb a time. I also see that there are some memory constrints on this. Having maybe 1000 processes reading, as for my mirror service, 256 kib buffers would be acceptable, occupying 256 MB RAM. That is reasonable, and I could even tolerate 512 MB ram used. But going to 1 MiB buffers would be overdoing it for my configuration. What would be the recommended chunk size for todays equipment? Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
On Tue, Feb 05, 2008 at 11:54:27AM -0500, Justin Piszcz wrote: On Tue, 5 Feb 2008, Keld Jørn Simonsen wrote: On Thu, Jan 31, 2008 at 02:55:07AM +0100, Keld Jørn Simonsen wrote: On Wed, Jan 30, 2008 at 11:36:39PM +0100, Janek Kozicki wrote: Keld Jørn Simonsen said: (by the date of Wed, 30 Jan 2008 23:00:07 +0100) All the raid10's will have double time for writing, and raid5 and raid6 will also have double or triple writing times, given that you can do striped writes on the raid0. For raid5 and raid6 I think this is even worse. My take is that for raid5 when you write something, you first read the chunk data involved, then you read the parity data, then you xor-subtract the data to be changed, and you xor-add the new data, and then write the new data chunk and the new parity chunk. In total 2 reads and 2 writes. The read/writes happen on the same chunks, so latency is minimized. But in essence it is still 4 IO operations, where it is only 2 writes on raid1/raid10, that is only half the speed for writing on raid5 compared to raid1/10. On raid6 this amounts to 6 IO operations, resulting in 1/3 of the writing speed of raid1/10. I note in passing that there is no difference between xor-subtract and xor-add. Also I assume that you can calculate the parities of both raid5 and raid6 given the old parities chunks and the old and new data chunk. If you have to calculate the new parities by reading all the component data chunks this is going to be really expensive, both in IO and CPU. For a 10 drive raid5 this would involve reading 9 data chunks, and making writes 5 times as expensive as raid1/10. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html On my benchmarks RAID5 gave the best overall speed with 10 raptors, although I did not play with the various offsets/etc as much as I have tweaked the RAID5. Could you give some figures? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
On Tue, Feb 05, 2008 at 05:28:27PM -0500, Justin Piszcz wrote: Could you give some figures? I remember testing with bonnie++ and raid10 was about half the speed (200-265 MiB/s) as RAID5 (400-420 MiB/s) for sequential output, but input was closer to RAID5 speeds/did not seem affected (~550MiB/s). Impressive. What levet of raid10 was involved? and what type of equipment, how many disks? Maybe the better output for raid5 could be due to some striping - AFAIK raid5 will be striping quite well, and writes almost equal to reading time indicates that the writes are striping too. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 or raid10 for /boot
On Mon, Feb 04, 2008 at 09:17:35AM +, Robin Hill wrote: On Mon Feb 04, 2008 at 07:34:54AM +0100, Keld Jørn Simonsen wrote: I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? A plain RAID-10 does _not_ have a layout equivalent to RAID-1 and _cannot_ be used for booting (well, possibly a 2-disk RAID-10 could - I'm not sure how that'd be layed out). RAID-10 uses striping as well as mirroring, and the striping breaks both grub and lilo (and, AFAIK, every other boot manager currently out there). Yes, it is understood that raid10,f2 uses striping, but a raid10,near=2, far=1 does not use striping, anfd this is what you get if you just make amdadm --create /dev/md0 -l 10 -n 2 /dev/sda1 /dev/sdb1 best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: draft howto on making raids for surviving a disk crash
On Sun, Feb 03, 2008 at 10:53:51AM -0500, Bill Davidsen wrote: Keld Jørn Simonsen wrote: This is intended for the linux raid howto. Please give comments. It is not fully ready /keld Howto prepare for a failing disk 6. /etc/mdadm.conf Something here on /etc/mdadm.conf. What would be safe, allowing a system to boot even if a disk has crashed? Recommend PARTITIONS by used Thanks Bill for your suggestions, which I have incorporated in the text. However, I do not understand what to do with the remark above. Please explain. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 and raid 10 always writes all data to all disks?
On Sun, Feb 03, 2008 at 10:56:01AM -0500, Bill Davidsen wrote: Keld Jørn Simonsen wrote: I found a sentence in the HOWTO: raid1 and raid 10 always writes all data to all disks I think this is wrong for raid10. eg a raid10,f2 of 4 disks only writes to two of the disks - not all 4 disks. Is that true? I suspect that really should have read all mirror copies, in the raid10 case. OK, I changed the text to: raid1 always writes all data to all disks. raid10 always writes all data to the number of copies that the raid holds. For example on a raid10,f2 or raid10,o2 of 6 disks, the data will only be written 2 times. Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1 or raid10 for /boot
I understand that lilo and grub only can boot partitions that look like a normal single-drive partition. And then I understand that a plain raid10 has a layout which is equivalent to raid1. Can such a raid10 partition be used with grub or lilo for booting? And would there be any advantages in this, for example better disk utilization in the raid10 driver compared with raid? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
draft howto on making raids for surviving a disk crash
This is intended for the linux raid howto. Please give comments. It is not fully ready /keld Howto prepare for a failing disk The following will describe how to prepare a system to survive if one disk fails. This can be important for a server which is intended to always run. The description is mostly aimed at small servers, but it can also be used for work stations to protect it for not losing data, and be running even if a disk fails. Some recommendations on larger server setup is given at the end of the howto. This requires some extra hardware, especially disks, and the description will also touch how to mak the most out of the disks, be it in terms of available disk space, or input/output speed. 1. Creating of partitions We recommend creating partitions for /boot, root, swap and other file systems. This can be done by fdisk, parted or maybe a graphical interface like the Mandriva/PClinuxos harddrake2. It is recommended to use drives with equal sizes and performance characteristics. If we are using the 2 drives sda and sdb, then sfdisk may be used to make all the partitions into raid partitions: sfdisk -c /dev/sda 1 fd sfdisk -c /dev/sda 2 fd sfdisk -c /dev/sda 3 fd sfdisk -c /dev/sda 5 fd sfdisk -c /dev/sdb 1 fd sfdisk -c /dev/sdb 2 fd sfdisk -c /dev/sdb 3 fd sfdisk -c /dev/sdb 5 fd Using: fdisk -l /dev/sda /dev/sdb The partition layout could then look like this: Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 37 297171 fd Linux raid autodetect /dev/sda2 381132 8795587+ fd Linux raid autodetect /dev/sda311331619 3911827+ fd Linux raid autodetect /dev/sda41620 121601 9637554155 Extended /dev/sda51620 121601 963755383+ fd Linux raid autodetect Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 37 297171 fd Linux raid autodetect /dev/sdb2 381132 8795587+ fd Linux raid autodetect /dev/sdb311331619 3911827+ fd Linux raid autodetect /dev/sdb41620 121601 9637554155 Extended /dev/sdb51620 121601 963755383+ fd Linux raid autodetect 2. Prepare for boot The system should be set up to boot from multiple devices, so that if one disk fails, the system can boot from another disk. On Intel hardware, there are two common boot loaders, grub and lilo. Both grub and lilo can only boot off a raid1. they cannot boot off any other software raid device type. The reason they can boot off the raid1 is that hey see the raid1 as a normal disk, they only then use one of the dishs when booting. The boot stage only involves loading the kernel with a initrd image, so not much data is needed for this. The kernel, the initrd and other boot files can be put in a small /boot partition. We recommend something like 200 MB on an ext3 raid1. Make the raid1 and ext3 filesystem: mdadm --create /dev/md0 --chunk=256 -R -l 1 -n 2 /dev/sda1 /dev/sdb1 mkfs -t ext3 -f /dev/md0 Make each of the disks bootable by lilo: lilo -b /dev/sda /etc/lilo.conf1 lilo -b /dev/sdb /etc/lilo.conf2 Make each of the disks bootable by grub (to be described) 3. The root file system The root file system can be on another raid tah the /boot partition. We recommend an raid10,f2, as the root file system will mostly be reads, and the raid10,f2 raid type is the fastest for reads, while also sufficient fast for writes. Other relevant raid types would be raid10,o2 or raid1. It is recommended to use the udev file system, as this runs in RAM, and you thus can avoid a number of read and writes to disk. It is recommended that all file systems are mounted with the noatime option, this avoids writing to the filesystem inodes every time a file has been read or written. Make the raid10,f2 and ext3 filesystem: mdadm --create /dev/md1 --chunk=256 -R -l 10 -n 2 -p f2 /dev/sda2 /dev/sdb2 mkfs -t ext3 -f /dev/md1 4. The swap file system If a disk fails, where processes are swapped to, then all these processes fail. This may be vital processes for the system, or vital jobs on the system. You can prevent the failing of the processes by having the swap partitions on a raid. The swap area needed is normally relatively small compared to the overall disk space available, so we recommend the faster raid types over the more space economic. The raid10,f2 type seems to be the fastest here, other relevant raid types could be raid10,o2 or raid1. Given that you have created a raid array, you can just make the swap partition
Re: draft howto on making raids for surviving a disk crash
On Sat, Feb 02, 2008 at 09:32:54PM +0100, Janek Kozicki wrote: Keld Jørn Simonsen said: (by the date of Sat, 2 Feb 2008 20:41:31 +0100) This is intended for the linux raid howto. Please give comments. It is not fully ready /keld very nice. do you intend to put it on http://linux-raid.osdl.org/ Yes, that is the intention. As wiki, it will be much easier for our community to fix errors and add updates. Agreed. But I will not put it up before I am sure it is reasonably flawless, ie. it will at least work. I found myself a few errors already. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 and grub
On Wed, Jan 30, 2008 at 06:47:19PM -0800, David Rees wrote: On Jan 30, 2008 6:33 PM, Richard Scobie [EMAIL PROTECTED] wrote: FWIW, this step is clearly marked in the Software-RAID HOWTO under Booting on RAID: http://tldp.org/HOWTO/Software-RAID-HOWTO-7.html#ss7.3 A good an extesive reference, but somewhat outdated. BTW, I suspect you are missing the command setup from your 3rd command above, it should be: # grub grub device (hd0) /dev/hdc grub root (hd0,0) grub setup (hd0) I do not grasp this. How and where is it said that two disks are involved? hda and hdc should both be involved. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1 and raid 10 always writes all data to all disks?
I found a sentence in the HOWTO: raid1 and raid 10 always writes all data to all disks I think this is wrong for raid10. eg a raid10,f2 of 4 disks only writes to two of the disks - not all 4 disks. Is that true? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote: Michael Tokarev wrote: With 5-drive linux raid10: A B C D E 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 ... AB can't be removed - 0, 5. AC CAN be removed, as are AD. But not AE - losing 2 and 7. And so on. I see. Does the kernel code allow this? And mdadm? And can B+E be removed safely, and C+E and B+D? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
On Wed, Jan 30, 2008 at 07:21:33PM +0100, Janek Kozicki wrote: Hello, Yes, I know that some levels give faster reading and slower writing, etc. I want to talk here about a typical workstation usage: compiling stuff (like kernel), editing openoffice docs, browsing web, reading email (email: I have a webdir format, and in boost mailing list directory I have 14000 files (posts), opening this directory takes circa 10 seconds in sylpheed). Moreover, opening .pdf files, more compiling of C++ stuff, etc... I have a remote backup system configured (with rsnapshot), which does backups two times a day. So I'm not afraid to lose all my data due to disc failure. I want absolute speed. Currently I have Raid-0, because I was thinking that this one is fastest. But I also don't need twice the capacity. I could use Raid-1 as well, if it was faster. Due to recent discussion about Raid-10,f2 I'm getting worried that Raid-0 is not the fastest solution, but instead a Raid-10,f2 is faster. So how really is it, which level gives maximum overall speed? I would like to make a benchmark, but currently, technically, I'm not able to. I'll be able to do it next month, and then - as a result of this discussion - I will switch to other level and post here benchmark results. How does overall performance change with the number of available drives? Perhaps Raid-0 is best for 2 drives, while Raid-10 is best for 3, 4 and more drives? Teoretically, raid0 and raid10,f2 should be the same for reading, given the same size of the md partition, etc. For writing, raid10,f2 should be half the speed of raid0. This should go both for sequential and random read/writes. But I would like to have real test numbers. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
On Wed, Jan 30, 2008 at 11:36:39PM +0100, Janek Kozicki wrote: Keld Jørn Simonsen said: (by the date of Wed, 30 Jan 2008 23:00:07 +0100) Teoretically, raid0 and raid10,f2 should be the same for reading, given the same size of the md partition, etc. For writing, raid10,f2 should be half the speed of raid0. This should go both for sequential and random read/writes. But I would like to have real test numbers. Me too. Thanks. Are there any other raid levels that may count here? Raid-10 with some other options? Given that you want maximum thruput for both reading and writing, I think there is only one way to go, that is raid0. All the raid10's will have double time for writing, and raid5 and raid6 will also have double or triple writing times, given that you can do striped writes on the raid0. For random and sequential writing in the normal case (no faulty disks) I would guess that all of the raid10's, the raid1 and raid5 are about equally fast, given the same amount of hardware. (raid5, raid6 a little slower given the unactive parity chunks). For random reading, raid0, raid1, raid10 should be equally fast, with raid5 a little slower, due to one of the disks virtually out of operation, as it is used for the XOR parity chunks. raid6 should be somewhat slower due to 2 non-operationable disks. raid10,f2 may have a slight edge due to virtually only using half the disk giving better average seek time, and using the faster outer disk halves. For sequential reading, raid0 and raid10,f2 should be equally fast. Possibly raid10,o2 comes quite close. My guess is that raid5 then is next, achieving striping rates, but with the loss of one parity drive, and then raid1 and raid10,n2 with equal performance. In degraded mode, I guess for random read/writes the difference is not big between any of the raid1, raid5 and raid10 layouts, while sequential reads will be especially bad for raid10,f2 approaching the random read rate, and others will enjoy the normal speed of the above filesystem (ext3, reiserfs, xfs etc). Theory, theory theory. Show me some real figures. Best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote: Linux raid10 MODULE (which implements that standard raid10 LEVEL in full) adds some quite.. unusual extensions to that standard raid10 LEVEL. The resulting layout is also called raid10 in linux (ie, not giving new names), but it's not that raid10 (which is again the same as raid1+0) as commonly known in various literature and on the internet. Yet raid10 module fully implements STANDARD raid10 LEVEL. My understanding is that you can have a linux raid10 of only 2 drives, while the standard RAID 1+0 requires 4 drives, so this is a huge difference. I am not sure what vanilla linux raid10 (near=2, far=1) has of properties. I think it can run with only 1 disk, but I think it does not have striping capabilities. It would be nice to have more info on this, eg in the man page. Is there an official web page for mdadm? And maybe the raid faq could be updated? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 05:07:27PM +0300, Michael Tokarev wrote: Peter Rabbitson wrote: Moshe Yudkowsky wrote: It is exactly what the names implies - a new kind of RAID :) The setup you describe is not RAID10 it is RAID1+0. Raid10 IS RAID1+0 ;) It's just that linux raid10 driver can utilize more.. interesting ways to lay out the data. My understandining is that raid10 is different from RAID1+0 Traditional RAID1+0 is composed of two RAID1's combined into one RAID0. It takes 4 drives to make it work. Linux raid10 only takes 2 drives to work. Traditional RAID1+0 only have one way of laying out the blocks. raid10 have a number of ways to do layout, namely the near, far and offset ways, layout=n2, f2, o2 respectively. Traditional RAID1+0 can only do striping of half of the disks involved, while raid10 can do striping on all disks in the far and offset layouts. I looked around on the net for documentation of this. The first hits (on Google) for mkadm did not have descriptions of raid10. Wikipedia describes raid 10 as a synonym for raid1+0. I think there is too much confusion on the raid10 term, and that also the marveleous linux raid10 layouts is a little known secret beyound maybe the circles of this linux-raid list. We should tell others more about the wondersi of raid10. And then I would like a good reference for describing how raid10,o2 works and why bigger chunks work. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 05:02:57AM -0600, Moshe Yudkowsky wrote: Neil, thanks for writing. A couple of follow-up questions to you and the group: If the answers above don't lead to a resolution, I can create two RAID1 pairs and join them using LVM. I would take a hit by using LVM to tie the pairs intead of RAID0, I suppose, but I would avoid the performance hit of multiple md drives on a single physical drive, and I could even run a hot spare through a sparing group. Any comments on the performance hit -- is raid1L a really bad idea for some reason? You can of cause construct a traditional raid-1+0 in Linux as you describe here, but this is different from linux raid10 (with its different layout possibilities). And constructing two grub/lilos on two disks for a raid1 on /boot seems to be the right way for a reasonably secured system. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 09:57:48AM -0600, Moshe Yudkowsky wrote: In my 4 drive system, I'm clearly not getting 1+0's ability to use grub out of the RAID10. I expect it's because I used 1.2 superblocks (why not use the latest, I said, foolishly...) and therefore the RAID10 -- with even number of drives -- can't be read by grub. If you'd patch that information into the man pages that'd be very useful indeed. If you have 4 drives, I think the right thing is to use a raid1 with 4 drives, for your /boot partition. Then yo can survive that 3 disks crash! If you want the extra performance, then I think you should not bother too much for the kernel and initrd load time - which of cause is not striping on the disks, but some performance improvement can be expected. Then you can have the rest of /root on a raid10,f2 with 4 disks. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 07:51:07PM +0300, Michael Tokarev wrote: Peter Rabbitson wrote: [] However if you want to be so anal about names and specifications: md raid 10 is not a _full_ 1+0 implementation. Consider the textbook scenario with 4 drives: (A mirroring B) striped with (C mirroring D) When only drives A and C are present, md raid 10 with near offset will not start, whereas standard RAID 1+0 is expected to keep clunking away. Ugh. Yes. offset is linux extension. But md raid 10 with default, n2 (without offset), configuration will behave exactly like in classic docs. I would like to understand this fully. What Peter described for mdraid10: md raid 10 with near offset I believe is vanilla raid10 without any options (or near=2, far=1). Will that not start if we are unlucky to have 2 drives failing, but we are lucky that the data on the two remaining drives actually have all the data? Same question for a raid10,f2 array. I think it would be easy to investigate, when the number of drives are even, if all data is present, and then happily run an array with some failing disks. Say for a 4 drive raid10,f2 disks A and D are failing, then all data should be present on drives B and C, given that A and C have the even chunks, and B and D have the odd chunks. Likewise for a 6 drive array, etc for all multiples of 2, with F2. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 07:46:58PM +0300, Michael Tokarev wrote: Keld Jørn Simonsen wrote: On Tue, Jan 29, 2008 at 06:13:41PM +0300, Michael Tokarev wrote: Linux raid10 MODULE (which implements that standard raid10 LEVEL in full) adds some quite.. unusual extensions to that standard raid10 LEVEL. The resulting layout is also called raid10 in linux (ie, not giving new names), but it's not that raid10 (which is again the same as raid1+0) as commonly known in various literature and on the internet. Yet raid10 module fully implements STANDARD raid10 LEVEL. My understanding is that you can have a linux raid10 of only 2 drives, while the standard RAID 1+0 requires 4 drives, so this is a huge difference. Ugh. 2-drive raid10 is effectively just a raid1. I.e, mirroring without any striping. (Or, backwards, striping without mirroring). OK. uhm, well, I did not understand: (Or, backwards, striping without mirroring). I don't think a 2 drive vanilla raid10 will do striping. Please explain. Pretty much like with raid5 of 2 disks - it's the same as raid1. I think in raid5 of 2 disks, half of the chunks are parity chynks which are evenly distributed over the two disks, and the parity chunk is the XOR of the data chunk. But maybe I am wrong. Also the behaviour of suce a raid5 is different from a raid1 as the parity chunk is not used as data. I am not sure what vanilla linux raid10 (near=2, far=1) has of properties. I think it can run with only 1 disk, but I think it number of copies should be = number of disks, so no. I have a clear understanding that in a vanilla linux raid10 (near=2, far=1) you can run with one failing disk, that is with only one working disk. Am I wrong? does not have striping capabilities. It would be nice to have more info on this, eg in the man page. It's all in there really. See md(4). Maybe it's not that verbose, but it's not a user's guide (as in: a large book), after all. Some man pages have examples. Or info could be written in the faq or in wikipedia. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
linux raid faq
Hmm, I read the Linux raid faq on http://www.faqs.org/contrib/linux-raid/x37.html It looks pretty outdated, referring to how to patch 2.2 kernels and not mentioning new mdadm, nor raid10. It was not dated. It seemed to be related to the linux-raid list, telling where to find archives of the list. Maybe time for an update? or is this not the right place to write stuff? If I searched on google for raid faq, the first say 5-7 items did not mention raid10. Maybe wikipedia is the way to go? I did contribute myself a little there. The software raid howto is dated v. 1.1 3rd of June 2004, http://unthought.net/Software-RAID.HOWTO/Software-RAID.HOWTO.html also pretty old. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 01:34:37PM -0600, Moshe Yudkowsky wrote: I'm going to convert back to the RAID 1 setup I had before for /boot, 2 hot and 2 spare across four drives. No, that's wrong: 4 hot makes the most sense. And given that RAID 10 doesn't seem to confer (for me, as far as I can tell) advantages in speed or reliability -- or the ability to mount just one surviving disk of a mirrored pair -- over RAID 5, I think I'll convert back to RAID 5, put in a hot spare, and do regular backups (as always). Oh, and use reiserfs with data=journal. Hmm, my idea was to use a raid10,f2 4 disk raid for the /root, or a o2 layout. I think it would offer quite some speed advantage over raid5. At least I had on a 4 disk raid5 only a random performance of about 130 MB/s while the raid10 gave 180-200 MB/s. Also sequential read was significantly faster on raid10. I do think I can get about 320 MB/s on the raid10,f2, but I need to have a bigger power supply to support my disks before I can go on testing. The key here is bigger readahead. I only got 150 MB/s for raid5 sequential reads. I think the sequential read could be significant in the boot time, and then for the single user running on the system, namely the system administrator (=me), even under reasonable load. I would be interested if you would experiment with this wrt boot time, for example the difference between /root on a raid5, raid10,f2 and raid10,o2. Comments back: Mr. Tokarev wrote: By the way, on all our systems I use small (256Mb for small-software systems, sometimes 512M, but 1G should be sufficient) partition for a root filesystem (/etc, /bin, /sbin, /lib, and /boot), and put it on a raid1 on all... ... doing [it] this way, you always have all the tools necessary to repair a damaged system even in case your raid didn't start, or you forgot where your root disk is etc etc. An excellent idea. I was going to put just /boot on the RAID 1, but there's no reason why I can't add a bit more room and put them all there. (Because I was having so much fun on the install, I'm using 4GB that I was going to use for swap space to mount base install and I'm working from their to build the RAID. Same idea.) If you put more than /boot on the raid1, then you will not get the added performance of raid10 for all your system utilities. I am not sure about redundance, but a raid1 and a raid10 should be equally vulnerable to a 1 disk faliure. If you use a 4 disk raid1 for /root, then of cause you can survive 3 disk crashes. I am not sure that 4 disks in a raid1 for /root give added performance, as grub only sees the /root raid1 as a normal disk, but maybe some kind of remounting makes it get its raid behaviour. Also, placing /dev on a tmpfs helps alot to minimize number of writes necessary for root fs. I thought of using the noatime mount option for /root. best regards Keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 04:14:24PM -0600, Moshe Yudkowsky wrote: Keld Jørn Simonsen wrote: Based on your reports of better performance on RAID10 -- which are more significant that I'd expected -- I'll just go with RAID10. The only question now is if LVM is worth the performance hit or not. Hmm, LVM for what purpose? For the root system, I think it is not an issue. Just have a large enough partition, it is not more than 10- 20 GB anyway, which is around 1 % of the disk sizes that we talk about today with new disks in raids. I would be interested if you would experiment with this wrt boot time, for example the difference between /root on a raid5, raid10,f2 and raid10,o2. According to man md(4), the o2 is likely to offer the best combination of read and write performance. Why would you consider f2 instead? I have no experiences with o2, and little experience with f2. But I kind of designed f2. I have not fully grasped o2 yet. But my take is that for writes, this would be random writes, and that is almost the same for all layouts. However, when/if a disk is faulty, then f2 has considerably worse performance for sequential reads, approximating the performance of random reads, which in some cases is about half the speed of sequential reads. For sequential reads and random reads I think f2 would be faster than o2, due to the smaller average seek times, and use of the faster part of the disk. I am still wondering how o2 gets to do striping, I don't understand it given the layout schemes I have seen. F2 OTOH is designed for striping. I would like to see some figures, tho. My testing environment is, as said, not operationable right now, but will be OK possibly later this week. I'm unlike to do any testing beyond running bonnie++ or something similar once it's installed. I do some crude testing like reading concurrently 1000 files of 20 MB, and then just cat file /dev/null of a 4 GB file. The RAM caches needs to be not capable of holding the files. Looking on boot times could also be interesting. I would like as litte downtime as possible. But it depends on your purpose and thus pattern of use. Many systems tend to be read oriented, and for that I think f2 is the better alternative. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
On Tue, Jan 29, 2008 at 06:32:54PM -0600, Moshe Yudkowsky wrote: Hmm, why would you put swap on a raid10? I would in a production environment always put it on separate swap partitions, possibly a number, given that a number of drives are available. In a production server, however, I'd use swap on RAID in order to prevent server downtime if a disk fails -- a suddenly bad swap can easily (will absolutely?) cause the server to crash (even though you can boot the server up again afterwards on the surviving swap partitions). I see. Which file system type would be good for this? I normally use XFS but maybe other FS is better, given that swap is used very randomly 8read/write). Will a bad swap crash the system? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: striping of a 4 drive raid10
On Mon, Jan 28, 2008 at 01:32:48PM -0500, Bill Davidsen wrote: Neil Brown wrote: On Sunday January 27, [EMAIL PROTECTED] wrote: Hi I have tried to make a striping raid out of my new 4 x 1 TB SATA-2 disks. I tried raid10,f2 in several ways: 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0 of md0+md1 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2 of md0+md1 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of md0 =md1 =128 KB, md2 = raid0 of md0+md1 chunksize = 256 KB 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1 Try 6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1 Also try raid10,o2 with a largeish chunksize (256KB is probably big enough). Looking at the issues raised, there might be some benefit from having the mirror chunks on the slower inner tracks of a raid10, and to read from the outer tracks if the drives with the data on the outer tracks are idle. This would appear to offer a transfer rate benefit overall. Hmm, how do I do this? I think this is normal behaviour of a raid10,f2. Is that so? So you mean I should rather use f2 than o2? Or should I configure the f2 in some way? My hdparm -t gives: /dev/sda5: Timing buffered beginning disk reads: 82 MB in 1.00 seconds = 81.686 MB/sec Timing buffered endingdisk reads: 42 MB in 1.03 seconds = 40.625 MB/sec Average seek time 13.714 msec, min=4.641, max=23.921 Average track-to-track time 28.151 msec, min=26.729, max=28.730 So, yes, there is a reason to use the faster outer tracks - and have the faster access time that f2 gives . How does o2 behave here? Does it read and search on the whole disk? As to your other comments in another mail, I could of cause install a newer kernel and mdadm, but then I would loose the support of my supported and paid system. And Neil said that there have been no performance fixes for f2 since the kernel I use (2.6.12). I thought that o2 support was included since 2.6.10 - but apparantly not so. Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
striping of a 4 drive raid10
Hi I have tried to make a striping raid out of my new 4 x 1 TB SATA-2 disks. I tried raid10,f2 in several ways: 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0 of md0+md1 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2 of md0+md1 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of md0 =md1 =128 KB, md2 = raid0 of md0+md1 chunksize = 256 KB 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1 My new disks give a transfer rate of about 80 MB/s, so I expected to have something like 320 MB/s for the whole raid, but I did not get more than about 180 MB/s. I think it may be something with the layout, that in effect the drives should be something like: sda1 sdb1sdc1 sdd1 01 2 3 45 6 7 And this was not really doable for the combination of raids, because thet combinations give different block layouts. How can it be done? Do we need a new raid type? Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: striping of a 4 drive raid10
On Mon, Jan 28, 2008 at 07:13:30AM +1100, Neil Brown wrote: On Sunday January 27, [EMAIL PROTECTED] wrote: Hi I have tried to make a striping raid out of my new 4 x 1 TB SATA-2 disks. I tried raid10,f2 in several ways: 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0 of md0+md1 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2 of md0+md1 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of md0 =md1 =128 KB, md2 = raid0 of md0+md1 chunksize = 256 KB 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize of md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1 Try 6: md0 = raid10,f2 of sda1+sdb1+sdc1+sdd1 That I already tried, (and I wrongly stated that I used f4 in stead of f2). I had two times a thruput of about 300 MB/s but since then I could not reproduce the behaviour. Are there errors on this that has been corrected in newer kernels? Also try raid10,o2 with a largeish chunksize (256KB is probably big enough). I tried that too, but my mdadm did not allow me to use the o flag. My kernel is 2.6.12 and mdadm is v1.12.0 - 14 June 2005. can I upgrade the mdadm alone to a newer version, and then which is recommendable? best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: striping of a 4 drive raid10
On Sun, Jan 27, 2008 at 08:11:35PM +, Peter Grandi wrote: On Sun, 27 Jan 2008 20:33:45 +0100, Keld Jørn Simonsen [EMAIL PROTECTED] said: keld Hi I have tried to make a striping raid out of my new 4 x keld 1 TB SATA-2 disks. I tried raid10,f2 in several ways: keld 1: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, md2 = raid0 keldof md0+md1 keld 2: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, md2 = raid01,f2 keldof md0+md1 keld 3: md0 = raid10,f2 of sda1+sdb1, md1= raid10,f2 of sdc1+sdd1, chunksize of keldmd0 =md1 =128 KB, md2 = raid0 of md0+md1 chunksize = 256 KB keld 4: md0 = raid0 of sda1+sdb1, md1= raid0 of sdc1+sdd1, chunksize keldof md0 = md1 = 128 KB, md2 = raid01,f2 of md0+md1 chunksize = 256 KB These stacked RAID levels don't make a lot of sense. keld 5: md0= raid10,f4 of sda1+sdb1+sdc1+sdd1 This also does not make a lot of sense. Why have four mirrors instead of two? My error, I did mean f2. Anyway 4 mirrors would make the disk 2 times faster than 2 disks, and given disk prices these days this could make a lot of sense. Instead, try 'md0 = raid10,f2' for example. The first mirror of will be striped across the outer half of all four drives, and the second mirrors will be rotated in the inner half of each drive. Which of course means that reads will be quite quick, but writes and degraded operation will be slower. Consider this post for more details: http://www.spinics.net/lists/raid/msg18130.html Thanks for the reference. There is also more in the original article on possible layouts of what is now known as raid10,f2 http://marc.info/?l=linux-raidm=107427614604701w=2 including performance enhancements due to use of the faster outer sectors, and smaller average seek times because you can seek on only half the disk. best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
hdparm patch with min/max transfer rate, and min/avg/max access times
Hi I have made some patches to hdparm to report min/max transfer rates, and min/avg/max access times. Enjoy! http://std.dkuug.dk/keld/hdparm-7.7-ks.tar.gz Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
performance of raid10,f2 on 4 disks
Hi! I have played around with raid10,f2 on a 2 disk array set, and I really liked the performance on the sequential reads. It looked like double up on the speed, about 173 MB/s for two SATA-2 disks. I then went on to look at my 4 new SATS-2 disks, to have the same kind of performance I made the array by: mdadm --create /dev/md3 --chunk=256 -R -l 10 -n 4 -p f2 /dev/sd[abcd]1 And my first tests showed a sequential read rate of 320 MB/s. Impressive! I then tried it a few more times, but then I could not get more than around 160 MB/s, which is less than what I got on 2 disks. Any ideas of what is going on? Best regards keld - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html