Re: How many drives are bad?
Peter Grandi wrote: In general, I'd use RAID10 (http://WWW.BAARF.com/), RAID5 in Interesting movement. What do you think is their stance on Raid Fix? :) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LVM performance
Oliver Martin wrote: Interesting. I'm seeing a 20% performance drop too, with default RAID and LVM chunk sizes of 64K and 4M, respectively. Since 64K divides 4M evenly, I'd think there shouldn't be such a big performance penalty. I am no expert, but as far as I have read you must not only have compatible chunk sizes (which is easy and most often the case). You also must stripe align the LVM chunks, so every chunk spans an even number of raid stripes (not raid chunks). Check the output of `dmsetup table`. The last number is the offset of the underlying block device at which the LVM data portion starts. It must be divisible by the raid stripe length (the length varies for different raid types). Currently LVM does not offer an easy way to do such alignment, you have to do it manually upon executing pvcreate. By using the option --metadatasize one can specify the size of the area between the LVM header (64KiB) and the start of the data area. So one would supply STRIPE_SIZE - 64 for metadatasize[*], and the result will be a stripe aligned LVM. This information is unverified, I just compiled it from different list threads and whatnot. I did this to my own arrays/volumes and I get near 100% raw speed. If someone else can confirm the validity of this - it would be great. Peter * The supplied number is always rounded up to be divisible by 64KiB, so the smallest total LVM header is at least 128KiB - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: which raid level gives maximum overall speed? (raid-10,f2 vs. raid-0)
Janek Kozicki wrote: writing on raid10 is supposed to be half the speed of reading. That's because it must write to both mirrors. I am not 100% certain about the following rules, but afaik any raid configuration has a theoretical[1] maximum read speed of the combined speed of all disks in the array and a maximum write speed equal to the combined speed of a disk-length of a stripe. By disk-length I mean how many disks are needed to reconstruct a single stripe - the rest of the writes are redundancy and are essentially non-accountable work. For raid5 it is N-1. For raid6 - N-2. For linux raid 10 it is N-C+1 where C is the number of chunk copies. So for -p n3 -n 5 we would get a maximum write speed of 3 x single drive speed. For raid1 the disk-length of a stripe is always 1. So the statement IMHO raid5 could perform good here, because in *continuous* write operation the blocks from other HDDs were just have been written, they stay in cache and can be used to calculate xor. So you could get close to almost raid-0 performance here. is quite incorrect. You will get close to raid-0 if you have many disks, but will never beat raid0, since once disk is always busy writing parity which is not part of the write request submitted to the mdX device in the first place. [1] Theoretical since any external factors (busy CPU, unsuitable elevator, random disk access, multiple raid levels on one physical device) would all contribute to take you further away from the maximums. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Deleting mdadm RAID arrays
Marcin Krol wrote: Tuesday 05 February 2008 21:12:32 Neil Brown napisał(a): % mdadm --zero-superblock /dev/sdb1 mdadm: Couldn't open /dev/sdb1 for write - not zeroing That's weird. Why can't it open it? Hell if I know. First time I see such a thing. Maybe you aren't running as root (The '%' prompt is suspicious). I am running as root, the % prompt is the obfuscation part (I have configured bash to display IP as part of prompt). Maybe the kernel has been told to forget about the partitions of /dev/sdb. But fdisk/cfdisk has no problem whatsoever finding the partitions . mdadm will sometimes tell it to do that, but only if you try to assemble arrays out of whole components. If that is the problem, then blockdev --rereadpt /dev/sdb I deleted LVM devices that were sitting on top of RAID and reinstalled mdadm. % blockdev --rereadpt /dev/sdf BLKRRPART: Device or resource busy % mdadm /dev/md2 --fail /dev/sdf1 mdadm: set /dev/sdf1 faulty in /dev/md2 % blockdev --rereadpt /dev/sdf BLKRRPART: Device or resource busy % mdadm /dev/md2 --remove /dev/sdf1 mdadm: hot remove failed for /dev/sdf1: Device or resource busy lsof /dev/sdf1 gives ZERO results. What does this say: dmsetup table - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID 1 and grub
Bill Davidsen wrote: Richard Scobie wrote: A followup for the archives: I found this document very useful: http://lists.us.dell.com/pipermail/linux-poweredge/2003-July/008898.html After modifying my grub.conf to refer to (hd0,0), reinstalling grub on hdc with: grub device (hd0) /dev/hdc grub root (hd0,0) grub (hd0) and rebooting with the bios set to boot off hdc, everything burst back into life. I shall now be checking all my Fedora/Centos RAID1 installs for grub installed on both drives. Have you actually tested this by removing the first hd and booting? Depending on the BIOS I believe that the fallback drive will be called hdc by the BIOS but will be hdd in the system. That was with RHEL3, but worth testing. The line: grub device (hd0) /dev/hdc simply means treat /dev/hdc as the first _bios_ hard disk in the system. This way when grub writes to the MBR of hd0, it will be in fact writing to /dev/hdc. The reason the drive must be referenced as hd0 (and not hd2) is because grub enuerates drives according to the bios, and therefore the drive from which the bios is currently booting is _always_ hd0. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
WRONG INFO (was Re: In this partition scheme, grub does not find md information?)
Moshe Yudkowsky wrote: over the other. For example, I've now learned that if I want to set up a RAID1 /boot, it must actually be 1.2 or grub won't be able to read it. (I would therefore argue that if the new version ever becomes default, then the default sub-version ought to be 1.2.) In the discussion yesterday I myself made a serious typo, that should not spread. The only superblock version that will work with current GRUB is 1.0 _not_ 1.2. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Peter Rabbitson wrote: Moshe Yudkowsky wrote: Here's a baseline question: if I create a RAID10 array using default settings, what do I get? I thought I was getting RAID1+0; am I really? Maybe you are, depending on your settings, but this is beyond the point. No matter what 1+0 you have (linux, classic, or otherwise) you can not boot from it, as there is no way to see the underlying filesystem without the RAID layer. With the current state of affairs (available mainstream bootloaders) the rule is: Block devices containing the kernel/initrd image _must_ be either: * a regular block device (/sda1, /hda, /fd0, etc.) * or a linux RAID 1 with the superblock at the end of the device (0.9 or 1.2) If any poor soul finds this in the mailing list archives, the above should read: ... * or a linux RAID 1 with the superblock at the end of the device (either version 0.9 or _1.0_) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help, big error, dd first GB of a raid:-/
Lars Schimmer wrote: Hi! Due to a very bad idea/error, I zeroed my first GB of /dev/md0. Now fdisk doesn't find any disk on /dev/md0. Any idea on how to recover? It largely depends on what is /dev/md0, and what was on /dev/md0. Provide very detailed info: * Was the MD device partitioned? * What filesystem(s) were residing on the array, what sizes, what order * What was each filesystem used for (mounted as what) Someone might be able to help at this point, however if you do not have a backup - you are in very very deep trouble already. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help, big error, dd first GB of a raid:-/
Lars Schimmer wrote: I activate the backup right now - was OpenAFS with some RW volumes - fairly easy to backup, but... If it's hard to recover raid data, I recreate the raid and forget the old data on it. It is not that hard to recover the raid itself, however the ext3 on top is most likely FUBAR (especially after 1GB was overwritten). Since like it seems the data is not that important to you, just roll back to a backup and move on. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Michael Tokarev wrote: With 5-drive linux raid10: A B C D E 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 ... AB can't be removed - 0, 5. AC CAN be removed, as are AD. But not AE - losing 2 and 7. And so on. I stand corrected by Michael, this is indeed the case with the current state of md raid 10. Either my observations were incorrect when I made them a year and a half ago, or some fixes went into the kernel since then. In any way - linux md10 does behave exactly as a classic raid 1+0 when created with -n D -p nS where D and S are both even and D = 2S. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Keld Jørn Simonsen wrote: On Wed, Jan 30, 2008 at 03:47:30PM +0100, Peter Rabbitson wrote: Michael Tokarev wrote: With 5-drive linux raid10: A B C D E 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 ... AB can't be removed - 0, 5. AC CAN be removed, as are AD. But not AE - losing 2 and 7. And so on. I see. Does the kernel code allow this? And mdadm? And can B+E be removed safely, and C+E and B+D? It seems like it. I just created the above raid configuration with 5 loop devices. Everything behaved just like Michael described. When the wrong drives disappeared - I started getting IO errors. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use new sb type
Tim Southerwood wrote: David Greaves wrote: IIRC Doug Leford did some digging wrt lilo + grub and found that 1.1 and 1.2 wouldn't work with them. I'd have to review the thread though... David - For what it's worth, that was my finding too. -e 0.9+1.0 are fine with GRUB, but 1.1 an 1.2 won't work under the filesystem that contains /boot, at least with GRUB 1.x (I haven't used LILO for some time nor have I tried the development GRUB 2). The reason IIRC boils down to the fact that GRUB 1 isn't MD aware, and the only reason one can get away with using it on a RAID 1 setup at all is that the constituent devices present the same data as the composite MD device, from the start. Putting an MD SB at/near the beginning of the device breaks this case and GRUB 1 doesn't know how to deal with it. Cheers Tim - I read the entire thread, and it seems that the discussion drifted away from the issue at hand. I hate flogging a dead horse, but here are my 2 cents: First the summary: * Currently LILO and GRUB are the only booting mechanisms widely available (GRUB2 is nowhere to be seen, and seems to be badly designed anyway) * Both of these boot mechanisms do not understand RAID at all, so they can boot only off a block device containing a hackishly-readable filesystem (lilo: files are mappable, grub: a stage1.5 exists) * The only raid level providing unfettered access to the underlying filesystem is RAID1 with a superblock at its end, and it has been common wisdom for years that you need RAID1 boot partition in order to boot anything at all. The problem is that these three points do not affect any other raid level (as you can not boot from any of them in a reliable fashion anyway). I saw a number of voices that backward compatibility must be preserved. I don't see any need for that because: * The distro managers will definitely RTM and will adjust their flashy GUIs to do the right thing by explicitly supplying -e 1.0 for boot devices * A clueless user might burn himself by making a single root on a single raid1 device. But wait - he can burn himself the same way by making the root a raid5 device and rebooting. Why do we sacrifice the right thing to do? To eliminate the possibility of someone shooting himself in the foot by not reading the manual? Cheers Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: write-intent bitmaps
Russell Coker wrote: Are there plans for supporting a NVRAM write-back cache with Linux software RAID? AFAIK even today you can place the bitmap in an external file residing on a file system which in turn can reside on the nvram... Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Moshe Yudkowsky wrote: One of the puzzling things about this is that I conceive of RAID10 as two RAID1 pairs, with RAID0 on top of to join them into a large drive. However, when I use --level=10 to create my md drive, I cannot find out which two pairs are the RAID1's: the --detail doesn't give that information. Re-reading the md(4) man page, I think I'm badly mistaken about RAID10. Furthermore, since grub cannot find the /boot on the md drive, I deduce that RAID10 isn't what the 'net descriptions say it is. It is exactly what the names implies - a new kind of RAID :) The setup you describe is not RAID10 it is RAID1+0. As far as how linux RAID10 works - here is an excellent article: http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Michael Tokarev wrote: Raid10 IS RAID1+0 ;) It's just that linux raid10 driver can utilize more.. interesting ways to lay out the data. This is misleading, and adds to the confusion existing even before linux raid10. When you say raid10 in the hardware raid world, what do you mean? Stripes of mirrors? Mirrors of stripes? Some proprietary extension? What Neil did was generalize the concept of N drives - M copies, and called it 10 because it could exactly mimic the layout of conventional 1+0 [*]. However thinking about md level 10 in the terms of RAID 1+0 is wrong. Two examples (there are many more): * mdadm -C -l 10 -n 3 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 /dev/sdc1 Odd number of drives, no parity calculation overhead, yet the setup can still suffer a loss of a single drive * mdadm -C -l 10 -n 2 -o f2 /dev/md10 /dev/sda1 /dev/sdb1 This seems useless at first, as it effectively creates a RAID1 setup, without preserving the FS format on disk. However md10 has read balancing code, so one could get a single thread sustained read at a speed twice what he could possibly get with md1 in the current implementation I guess I will sit down tonight and craft some patches to the existing md* man pages. Some things are indeed left unsaid. Peter [*] The layout is the same but the functionality is different. If you have 1+0 on 4 drives, you can survive a loss of 2 drives as long as they are part of different mirrors. mdadm -C -l 10 -n 4 -o n2 drives however will _NOT_ survive a loss of 2 drives. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Moshe Yudkowsky wrote: Here's a baseline question: if I create a RAID10 array using default settings, what do I get? I thought I was getting RAID1+0; am I really? Maybe you are, depending on your settings, but this is beyond the point. No matter what 1+0 you have (linux, classic, or otherwise) you can not boot from it, as there is no way to see the underlying filesystem without the RAID layer. With the current state of affairs (available mainstream bootloaders) the rule is: Block devices containing the kernel/initrd image _must_ be either: * a regular block device (/sda1, /hda, /fd0, etc.) * or a linux RAID 1 with the superblock at the end of the device (0.9 or 1.2) My superblocks, by the way, are marked version 01; my metadata in mdadm.conf asked for 1.2. I wonder what I really got. This is how you find the actual raid version: mdadm -D /dev/md[X] | grep Version This will return a string of the form XX.YY.ZZ. Your superblock version is XX.YY. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: In this partition scheme, grub does not find md information?
Moshe Yudkowsky wrote: Keld Jørn Simonsen wrote: raid10 have a number of ways to do layout, namely the near, far and offset ways, layout=n2, f2, o2 respectively. The default layout, according to --detail, is near=2, far=1. If I understand what's been written so far on the topic, that's automatically incompatible with 1+0. Unfortunately you are interpreting this wrong as well. far=1 is just a way of saying 'no copies of type far'. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)
Hello, It seems that mdadm/md do not perform proper sanity checks before adding a component to a degraded array. If the size of the new component is just right, the superblock information will overlap with the data area. This will happen without any error indications in the syslog or otherwise. I came up with a reproducible scenario which I am attaching to this email alongside with the entire test script. I have not tested it for other raid levels, or other types of superblocks, but I suspect the same problem will occur for many other configurations. I am willing to test patches, however the attached script is non-intrusive enough to be executed anywhere. The output of the script follows bellow. Peter == == == [EMAIL PROTECTED]:/media/space/testmd# ./md_overlap_test Creating component 1 (1056768 bytes)... done. Creating component 2 (1056768 bytes)... done. Creating component 3 (1056768 bytes)... done. === Creating 3 disk raid5 array with v1.1 superblock mdadm: array /dev/md9 started. Waiting for resync to finish... done. md9 : active raid5 loop3[3] loop2[1] loop1[0] 2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU] Initial checksum of raw raid5 device: 4df1921524a3b717a956fceaed0ae691 /dev/md9 === Failing first componnent mdadm: set /dev/loop1 faulty in /dev/md9 mdadm: hot removed /dev/loop1 md9 : active raid5 loop3[3] loop2[1] 2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/2] [_UU] Checksum of raw raid5 device after failing componnent: 4df1921524a3b717a956fceaed0ae691 /dev/md9 === Re-creating block device with size 1048576 bytes, so both the superblock and data start at the same spot Adding back to array mdadm: added /dev/loop1 Waiting for resync to finish... done. md9 : active raid5 loop1[4] loop3[3] loop2[1] 2048 blocks super 1.1 level 5, 64k chunk, algorithm 2 [3/3] [UUU] Checksum of raw raid5 device after adding back smaller component: bb854f77ad222d224fcdd8c8f96b51f0 /dev/md9 === Attempting recovery Waiting for recovery to finish... done. Performing check Waiting for check to finish... done. Current value of mismatch_cnt: 0 Checksum of raw raid5 device after repair/check: 146f5c37305c42cda64538782c8c3794 /dev/md9 [EMAIL PROTECTED]:/media/space/testmd# #!/bin/bash echo Please read the script first, and comment the exit line at the top. echo This script will require about 3MB of free space, it will free (and use) echo loop devices 1 2 and 3, and will use the md device number specified in MD_DEV. exit 0 MD_DEV=md9# make sure this is not an array you use COMP_NUM=3 COMP_SIZE=$((1 * 1024 * 1024 + 8192)) #1MiB comp sizes with room for 8k (16 sect) of metadata mdadm -S /dev/$MD_DEV /dev/null DEVS= for i in $(seq $COMP_NUM); do echo -n Creating component $i ($COMP_SIZE bytes)... losetup -d /dev/loop${i} /dev/null set -e PCMD=print \\\x${i}${i}\ x $COMP_SIZE # fill entire image with the component number (0xiii...) perl -e $PCMD dummy${i}.img losetup /dev/loop${i} dummy${i}.img DEVS=$DEVS /dev/loop${i} set +e echo done. done echo echo echo === echo Creating $COMP_NUM disk raid5 array with v1.1 superblock # superblock at beginning of blockdev guarantees that it will overlap with real data, not with parity mdadm -C /dev/$MD_DEV -l 5 -n $COMP_NUM -e 1.1 $DEVS echo -n Waiting for resync to finish... while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do echo -n . sleep 1 done echo done. echo grep -A1 $MD_DEV /proc/mdstat echo echo -n Initial checksum of raw raid5 device: md5sum /dev/$MD_DEV echo echo echo === echo Failing first componnent mdadm -f /dev/$MD_DEV /dev/loop1 mdadm -r /dev/$MD_DEV /dev/loop1 echo grep -A1 $MD_DEV /proc/mdstat echo echo -n Checksum of raw raid5 device after failing componnent: md5sum /dev/$MD_DEV echo echo echo === NEWSIZE=$(( $COMP_SIZE - $(cat /sys/block/$MD_DEV/md/rd1/offset) * 512 )) echo Re-creating block device with size $NEWSIZE bytes, so both the superblock and data start at the same spot losetup -d /dev/loop1 /dev/null PCMD=print \\\x11\ x $NEWSIZE perl -e $PCMD dummy1.img losetup /dev/loop1 dummy1.img echo Adding back to array mdadm -a /dev/$MD_DEV /dev/loop1 echo -n Waiting for resync to finish... while [ $(cat /sys/block/$MD_DEV/md/sync_action) != idle ] ; do echo -n .
Re: BUG: possible array corruption when adding a component to a degraded raid5 (possibly other levels too)
Neil Brown wrote: On Monday January 28, [EMAIL PROTECTED] wrote: Hello, It seems that mdadm/md do not perform proper sanity checks before adding a component to a degraded array. If the size of the new component is just right, the superblock information will overlap with the data area. This will happen without any error indications in the syslog or otherwise. I thought I fixed that What versions of Linux kernel and mdadm are you using for your tests? Linux is 2.6.23.14 with everything md related compiled in (no modules) mdadm - v2.6.4 - 19th October 2007 (latest in debian/sid) Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use new sb type
David Greaves wrote: Jan Engelhardt wrote: This makes 1.0 the default sb type for new arrays. IIRC there was a discussion a while back on renaming mdadm options (google Time to deprecate old RAID formats?) and the superblocks to emphasise the location and data structure. Would it be good to introduce the new names at the same time as changing the default format/on-disk-location? David Also wasn't the concession to make 1.1 default instead of 1.0 ? Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem with raid5 grow/resize (not restripe)
Hello, I can not seem to be able to extend slightly a raid volume of mine. I issue the command: mdadm --grow --size=max /dev/md5 it completes and nothing happens. The kernel log is empty, however the even counter on the drive is incremented by +3. Here is what I have (yes I know that I am resizing only by about 200MB). Why am I not able to reach 824.8GiB? Thank you for your help. [EMAIL PROTECTED]:~# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md5 : active raid5 sda3[4] sdd3[3] sdc3[2] sdb3[1] 864276480 blocks super 1.1 level 5, 2048k chunk, algorithm 2 [4/4] [] md10 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0] 5353472 blocks 1024K chunks 3 far-copies [4/4] [] md1 : active raid1 sdd1[1] sdc1[0] sdb1[3] sda1[2] 56128 blocks [4/4] [] unused devices: none [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# mdadm -D /dev/md5 /dev/md5: Version : 01.01.03 Creation Time : Tue Jan 22 03:52:42 2008 Raid Level : raid5 Array Size : 864276480 (824.24 GiB 885.02 GB) Used Dev Size : 576184320 (274.75 GiB 295.01 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 5 Persistence : Superblock is persistent Update Time : Wed Jan 23 02:21:47 2008 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 2048K Name : Thesaurus:Crypta (local to host Thesaurus) UUID : 1decb2d1:ebf16128:a240938a:669b0999 Events : 5632 Number Major Minor RaidDevice State 4 830 active sync /dev/sda3 1 8 191 active sync /dev/sdb3 2 8 352 active sync /dev/sdc3 3 8 513 active sync /dev/sdd3 [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# fdisk -l /dev/sd[abcd] Disk /dev/sda: 400.0 GB, 400088457216 bytes 255 heads, 63 sectors/track, 48641 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sda1 1 7 56196 fd Linux raid autodetect /dev/sda2 8 507 4016250 fd Linux raid autodetect /dev/sda3 508 36385 288190035 83 Linux /dev/sda4 36386 4864198446320 83 Linux Disk /dev/sdb: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdb1 1 7 56196 fd Linux raid autodetect /dev/sdb2 8 507 4016250 fd Linux raid autodetect /dev/sdb3 508 36385 288190035 83 Linux /dev/sdb4 36386 3891320306160 83 Linux Disk /dev/sdc: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdc1 1 7 56196 fd Linux raid autodetect /dev/sdc2 8 507 4016250 fd Linux raid autodetect /dev/sdc3 508 36385 288190035 83 Linux /dev/sdc4 36386 36483 787185 83 Linux Disk /dev/sdd: 300.0 GB, 300090728448 bytes 255 heads, 63 sectors/track, 36483 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x Device Boot Start End Blocks Id System /dev/sdd1 1 7 56196 fd Linux raid autodetect /dev/sdd2 8 507 4016250 fd Linux raid autodetect /dev/sdd3 508 36385 288190035 83 Linux /dev/sdd4 36386 36483 787185 83 Linux [EMAIL PROTECTED]:~# - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
Justin Piszcz wrote: mdadm --create \ --verbose /dev/md3 \ --level=5 \ --raid-devices=10 \ --chunk=1024 \ --force \ --run /dev/sd[cdefghijkl]1 Justin. Interesting, I came up with the same results (1M chunk being superior) with a completely different raid set with XFS on top: mdadm --create \ --level=10 \ --chunk=1024 \ --raid-devices=4 \ --layout=f3 \ ... Could it be attributed to XFS itself? Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k
Justin Piszcz wrote: On Thu, 28 Jun 2007, Peter Rabbitson wrote: Interesting, I came up with the same results (1M chunk being superior) with a completely different raid set with XFS on top: ... Could it be attributed to XFS itself? Peter Good question, by the way how much cache do the drives have that you are testing with? I believe 8MB, but I am not sure I am looking at the right number: [EMAIL PROTECTED]:~# hdparm -i /dev/sda /dev/sda: Model=aMtxro7 2Y050M , FwRev=AY5RH10W, SerialNo=6YB6Z7E4 Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0? CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0: ATA/ATAPI-1 ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7 * signifies the current active mode [EMAIL PROTECTED]:~# 1M chunk consistently delivered best performance with: o A plain dumb dd run o bonnie o two bonnie threads o iozone with 4 threads My RA is set at 256 for the drives and 16384 for the array (128k and 8M respectively) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LVM on raid10 - severe performance drop
Bernd Schubert wrote: Try to increase the read-ahead size of your lvm devices: blockdev --setra 8192 /dev/raid10/space or increase it at least to the same size as of your raid (blockdev --getra /dev/mdX). This did the trick, although I am still lagging behind the raw md device by about 3 - 4%. Thanks for pointing this out! - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
question about --assume-clean
Hi, I am about to create a large raid10 array, and I know for a fact that all the components are identical (dd if=/dev/zero of=/dev/sdXY). Is it safe to pass --assume-clean and spare 6 hours of reconstruction, or are there some hidden dangers in doing so? Thanks Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
LVM on raid10 - severe performance drop
Hi, This question might be better suited for the lvm mailing list, but raid10 being rather new, I decided to ask here first. Feel free to direct me elsewhere. I want to use lvm on top of a raid10 array, as I need the snapshot capability for backup purposes. The tuning and creation of the array went fine, I am getting the read performance I am looking for. However as soon as I create a VG using the array as the only PV, the raw read performance drops to the ground. I suspect it has to do with some minima l tuning of LVM parameters, but I am at a loss on what to tweak (and Google is certainly evil to me today). Below I am including my configuration and test results, please let me know if you spot anything wrong, or have any suggestions. Thank you! Peter [EMAIL PROTECTED]:~# mdadm -D /dev/md1 /dev/md1: Version : 00.90.03 Creation Time : Sat Jun 9 15:28:01 2007 Raid Level : raid10 Array Size : 317444096 (302.74 GiB 325.06 GB) Used Dev Size : 238083072 (227.05 GiB 243.80 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sat Jun 9 19:33:29 2007 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : near=1, far=3 Chunk Size : 1024K UUID : c16dbfd8:8a139e54:6e26228f:2ab99bd0 (local to host Arzamas) Events : 0.4 Number Major Minor RaidDevice State 0 820 active sync /dev/sda2 1 8 181 active sync /dev/sdb2 2 8 342 active sync /dev/sdc2 3 8 503 active sync /dev/sdd2 [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# pvs -v Scanning for physical volume names PV VG Fmt Attr PSize PFree DevSize PV UUID /dev/md1 raid10 lvm2 a- 302.73G 300.73G 302.74G vS7gT1-WTeh-kXng-Iw7y-gzQc-1KSH-mQ1PQk [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# vgs -v Finding all volume groups Finding volume group raid10 VG Attr Ext #PV #LV #SN VSize VFree VG UUID raid10 wz--n- 4.00M 1 1 0 302.73G 300.73G ZosHXa-B1Iu-bax1-zMDk-FUbp-37Ff-k01aOK [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# lvs -v Finding all logical volumes LVVG #Seg Attr LSize Maj Min KMaj KMin Origin Snap% Move Copy% Log LV UUID space raid101 -wi-a- 2.00G -1 -1 253 0 i0p99S-tWFz-ELpl-bGXt-4CWz-Elr4-a1ao8f [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# dd if=/dev/md1 of=/dev/null bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 11.4846 seconds, 183 MB/s [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# dd if=/dev/md1 of=/dev/null bs=512 count=400 400+0 records in 400+0 records out 204800 bytes (2.0 GB) copied, 11.4032 seconds, 180 MB/s [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# dd if=/dev/raid10/space of=/dev/null bs=1M count=2000 2000+0 records in 2000+0 records out 2097152000 bytes (2.1 GB) copied, 25.7089 seconds, 81.6 MB/s [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# dd if=/dev/raid10/space of=/dev/null bs=512 count=400 400+0 records in 400+0 records out 204800 bytes (2.0 GB) copied, 26.1776 seconds, 78.2 MB/s [EMAIL PROTECTED]:~# P.S. I know that dd is not the best benchmarking tool, but the difference is so big, that even this non-scientific approach works. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Customize the error emails of `mdadm --monitor`
Iustin Pop wrote: On Wed, Jun 06, 2007 at 01:31:44PM +0200, Peter Rabbitson wrote: Peter Rabbitson wrote: Hi, Is there a way to list the _number_ in addition to the name of a problematic component? The kernel trend to move all block devices into the sdX namespace combined with the dynamic name allocation renders messages like /dev/sdc1 has problems meaningless. It would make remote server support so much easier, by allowing the administrator to label drive trays Component0 Component1 Component2... etc, and be sure that the local tech support person will not pull out the wrong drive from the system. Any takers? Or is it a RTFM question (in which case I certainly overlooked the relevant doc)? If you use udev, have you looked in /dev/disk? I think it solves the problem you need by allowing one to see either the disks by id or by path. Making the reverse map is then trivial (for a reasonable number of disks). This would not work as arrays are assembled by the kernel at boot time, at which point there is no udev or anything else for that matter other than /dev/sdX. And I am pretty sure my OS (debian) does not support udev in initrd as of yet. Pete - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Customize the error emails of `mdadm --monitor`
Gabor Gombas wrote: On Wed, Jun 06, 2007 at 02:23:31PM +0200, Peter Rabbitson wrote: This would not work as arrays are assembled by the kernel at boot time, at which point there is no udev or anything else for that matter other than /dev/sdX. And I am pretty sure my OS (debian) does not support udev in initrd as of yet. But I think sending mails from the initrd isn't supported either, so if you already hack the initrd, you can get the path information from sysfs. udev is nothing magical, it just walks the sysfs tree and calls some little helper programs when collecting the information for building /dev/disk; you can do that yourself if you want. Gabor I think I did not make my problem clear enough. The _device name_ reported in the emails is the one with which the array was initially assembled. For this I have two choices: * Kernel auto-assembly - the parts are properly detected and assembled, but there is no strong relationship between component number and sdX, especially if asynchronous scsi scanning takes place. * Assembly by mdadm.conf - I can put whatever block devices I want in there, and they will be preserved in the email, but it is very cumbersome to do it for root and other system partitions. So I was asking if the component _number_, which is unique to a specific device regardless of the assembly mechanism, can be reported in case of a failure. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Customize the error emails of `mdadm --monitor`
Gabor Gombas wrote: On Wed, Jun 06, 2007 at 04:24:31PM +0200, Peter Rabbitson wrote: So I was asking if the component _number_, which is unique to a specific device regardless of the assembly mechanism, can be reported in case of a failure. So you need to write an event-handling script and pass it to mdadm (--program). In the script you can walk sysfs and/or call the appropriate helper programs to extract all the information you need and format it in the way you want. For example, if you want the slot number of a failed disk, you can get it from /sys/block/$2/md/dev-$3/slot (according to the manpage, not tested). Now that's some real advice. I have not thought of that. Thank you! Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Customize the error emails of `mdadm --monitor`
Hi, Is there a way to list the _number_ in addition to the name of a problematic component? The kernel trend to move all block devices into the sdX namespace combined with the dynamic name allocation renders messages like /dev/sdc1 has problems meaningless. It would make remote server support so much easier, by allowing the administrator to label drive trays Component0 Component1 Component2... etc, and be sure that the local tech support person will not pull out the wrong drive from the system. Thanks Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to synchronize two devices (RAID-1, but not really?)
Tomasz Chmielewski wrote: I have a RAID-10 setup of four 400 GB HDDs. As the data grows by several GBs a day, I want to migrate it somehow to RAID-5 on separate disks in a separate machine. Which would be easy, if I didn't have to do it online, without stopping any services. Your /dev/md10 - what is directly on top of it? LVM? XFS? EXT3? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid1 replaced with raid10?
Neil Brown wrote: On Friday May 4, [EMAIL PROTECTED] wrote: Peter Rabbitson wrote: Hi, I asked this question back in march but received no answers, so here it goes again. Is it safe to replace raid1 with raid10 where the amount of disks is equal to the amount of far/near/offset copies? I understand it has the downside of not being a bit-by-bit mirror of a plain filesystem. Are there any other caveats? To answer the original question, I assume you mean replace as in backup, create new array, then restore. You will get different performance characteristics. Whether they better suit your needs or not will depend largely on your needs. Hi Neil, Yes I meant take an existing 2 drive raid1 array (non bootable data) and put a raid10 array in its place. All my testing indicates that I get the same write performance but nearly double the read speed (due to interleaving I guess). It seemed to good to be true, thus I am asking the question. Could you elaborate on your last sentence? Are there downsides I could not think of? Thank you! Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid1 replaced with raid10?
Neil Brown wrote: On Monday May 7, [EMAIL PROTECTED] wrote: Neil Brown wrote: On Friday May 4, [EMAIL PROTECTED] wrote: Peter Rabbitson wrote: Hi, I asked this question back in march but received no answers, so here it goes again. Is it safe to replace raid1 with raid10 where the amount of disks is equal to the amount of far/near/offset copies? I understand it has the downside of not being a bit-by-bit mirror of a plain filesystem. Are there any other caveats? To answer the original question, I assume you mean replace as in backup, create new array, then restore. You will get different performance characteristics. Whether they better suit your needs or not will depend largely on your needs. Hi Neil, Yes I meant take an existing 2 drive raid1 array (non bootable data) and put a raid10 array in its place. All my testing indicates that I get the same write performance but nearly double the read speed (due to interleaving I guess). It seemed to good to be true, thus I am asking the question. Could you elaborate on your last sentence? Are there downsides I could not think of? Thank you! I would have thought that you need far or offset to improve read performance, and they tend to hurt write performance (though I haven't really measured offset much). What layout are you using? Correct, I am using 'far' layout. The interleaving of the 'offset' layout does not work too good for sequential reads, but far really shines. Yes write performance is hurt by about 10%. Compared to 190% gain in reads I can live with it. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Raid1 replaced with raid10?
Bill Davidsen wrote: Not worth a repost, since I was way over answering his question... Erm... and now you made me curios :) Please share your thoughts if it is not too much trouble. Thank you for your time. Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Speed variation depending on disk position (was: Linux SW RAID: HW Raid Controller/JBOD vs. Multiple PCI-e Cards?)
Chris Wedgwood wrote: snip Also, 'dd performance' varies between the start of a disk and the end. Typically you get better performance at the start of the disk so dd might not be a very good benchmark here. Hi, Sorry for hijacking this thread, but I was actually planning to ask this very same question. Is the behavior you are describing above manufacturer dependent or it is pretty much dictated by the general design of modern drives? I have an array of 4 Maxtor sata drives, and raw read performance at the end of the disk is 38mb/s compared to 62mb/s at the beginning. Thanks - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid1 replaced with raid10?
Hi, I asked this question back in march but received no answers, so here it goes again. Is it safe to replace raid1 with raid10 where the amount of disks is equal to the amount of far/near/offset copies? I understand it has the downside of not being a bit-by-bit mirror of a plain filesystem. Are there any other caveats? Thanks Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS sunit/swidth for raid10
dean gaudet wrote: On Thu, 22 Mar 2007, Peter Rabbitson wrote: dean gaudet wrote: On Thu, 22 Mar 2007, Peter Rabbitson wrote: Hi, How does one determine the XFS sunit and swidth sizes for a software raid10 with 3 copies? mkfs.xfs uses the GET_ARRAY_INFO ioctl to get the data it needs from software raid and select an appropriate sunit/swidth... although i'm not sure i agree entirely with its choice for raid10: So do I, especially as it makes no checks for the amount of copies (3 in my case, not 2). it probably doesn't matter. This was essentially my question. For an array -pf3 -c1024 I get swidth = 4 * sunit = 4MiB. Is it about right and does it matter at all? how many drives? Sorry. 4 drives, 3 far copies (so any 2 drives can fail), 1M chunk. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
XFS sunit/swidth for raid10
Hi, How does one determine the XFS sunit and swidth sizes for a software raid10 with 3 copies? Thanks Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid10 far layout outperforms offset at writing? (was: Help with chunksize on raid10 -p o3 array)
Peter Rabbitson wrote: I have been trying to figure out the best chunk size for raid10 before migrating my server to it (currently raid1). I am looking at 3 offset stripes, as I want to have two drive failure redundancy, and offset striping is said to have the best write performance, with read performance equal to far. Incorporating suggestions from previous posts (thank you everyone), I used this modified script at http://rabbit.us/pool/misc/raid_test2.txt To negate effects of caching memory was jammed below 200mb free by using a full tmpfs mount with no swap. Here is what I got with far layout (-p f3): http://rabbit.us/pool/misc/raid_far.html The clear winner is 1M chunks, and is very consistent at any block size. I was surprised even more to see that my read speed was identical to that of a raid0 getting near the _maximum_ physical speed of 4 drives (roughly 55MB sustained across 1.2G). Unlike offset layout, far really shines at reading stuff back. The write speed did not suffer noticeably compared to offset striping. Here are the results (-p o3) for comparison: http://rabbit.us/pool/misc/raid_offset.html, and they roughly seem to correlate with my earlier testing using dd. So I guess the way to go for this system will be f3, although the md(4) says that offset layout should be more beneficial. Is there anything I missed while setting my o3 array, so that I got worse performance for both read and write compared to f3? Once again thanks everyone for the help. Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Raid1 replaced with raid10?
Hi, I just tried an idea I got after fiddling with raid10 and to my dismay it worked as I thought it will. I used two small partitions on separate disks to create a raid1 array. Then I did dd if=/dev/md2 of=/dev/null. I got only one of the disks reading. Nothing unexpected. Then I created a raid10 array on the same two partitions with the options -l10 -n2 -pf2. The same dd executed at twice the speed, reading _simultaneously_ from both drives. I did some bonnie++ benchmarking - same result - raid1 reads only from a single disk raid10 from both. Write performance is worse (about 10% slower) with raid10, but you get twice the read speed. In this light the obvious question is: can raid10 be used as a drop-in replacement for raid1 or there is a caveat with having the amount of disks equal the amount of chunk copies? Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with chunksize on raid10 -p o3 array
Neil Brown wrote: The different block sizes in the reads will make very little difference to the results as the kernel will be doing read-ahead for you. If you want to really test throughput at different block sizes you need to insert random seeks. Neil, thank you for the time and effort to answer my previous email. Excellent insights. I thought that read-ahead is filesystem specific, and subsequently I would be safe to use the raw device. I will definitely test with bonnie again. * Why although I have 3 identical chunks of data at any time, dstat never showed simultaneous reading from more than 2 drives. Every dd run was accompanied by maxing out one of the drives at 58MB/s and another one was trying to catch up to various degrees depending on the chunk size. Then on the next dd run two other drives would be (seemingly random) selected and the process would repeat. Poor read-balancing code. It really needs more thought. Possibly for raid10 we shouldn't try to balance at all. Just read from the 'first' copy in each case Is this anywhere near the top of the todo list, or for now raid10 users are bound to a maximum read speed of a two drive combination? And a last question - earlier in this thread Bill Davidsen suggested to play with the stripe_cache_size. I tried to increase it (did just two tests though) with no apparent effect. Does this setting apply to raid1/10 at all or it is strictly in the raid5/6 domain? If so, are there any tweaks apart from the chunk size and the layout that can affect raid10 performance? Once again thank you for the help. Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with chunksize on raid10 -p o3 array
Richard Scobie wrote: Peter Rabbitson wrote: Is this anywhere near the top of the todo list, or for now raid10 users are bound to a maximum read speed of a two drive combination? I have not done any testing with the md native RAID10 implementations, so perhaps there are some other advantages, but have you tried setting up your 4 drives as a RAID 0 made up of a pair of RAID1s? The advantage is higher redundancy when I can have any two drives fail in a x3 layout unlike the raid1/0 setup, although I sacrifice available disk space. But yes, I agree that if I was after pure throughput raid1/0 would be more beneficial, with the downside of 1.5 disk failure redundancy. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help with chunksize on raid10 -p o3 array
Bill Davidsen wrote: Peter Rabbitson wrote: Hi, I have been trying to figure out the best chunk size for raid10 before By any chance did you remember to increase stripe_cache_size to match the chunk size? If not, there you go. At the end of /usr/src/linux/Documentation/md.txt it specifically says that stripe_cache_size is raid5 specific, and it made sense to me, as caching stuff to avoid re-doing parity is beneficial. I will test later today trying to set the cache higher. Are there any guidelines on how large should it be in relation to the chunk size/number of drives for raid10? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mismatch_cnt questions - how about raid10?
Neil Brown wrote: When we write to a raid1, the data is DMAed from memory out to each device independently, so if the memory changes between the two (or more) DMA operations, you will get inconsistency between the devices. Does this apply to raid 10 devices too? And in case of LVM if swap is on top of a LV which is a part of a VG which has a single PV as the raid array - will this happen as well? Or will the LVM layer take the data once and distribute exact copies of it to the PVs (in this case just the raid) effectively giving the raid array invariable data? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mismatch_cnt questions - how about raid10?
Neil Brown wrote: On Tuesday March 6, [EMAIL PROTECTED] wrote: Neil Brown wrote: When we write to a raid1, the data is DMAed from memory out to each device independently, so if the memory changes between the two (or more) DMA operations, you will get inconsistency between the devices. Does this apply to raid 10 devices too? And in case of LVM if swap is on top of a LV which is a part of a VG which has a single PV as the raid array - will this happen as well? Or will the LVM layer take the data once and distribute exact copies of it to the PVs (in this case just the raid) effectively giving the raid array invariable data? Yes, it applies to raid10 too. I don't know the details of the inner workings of LVM, but I doubt it will make a difference. Copying the data in memory is just too costly to do if it can be avoided. With LVM and raid1/10 it can be avoided with no significant cost. With raid4/5/6, not copying into the cache can cause data corruption. So we always copy. I see. So basically for those of us who want to run swap on raid 1 or 10, and at the same time want to rely on mismatch_cnt for early problem detection, the only option is to create a separate md device just for the swap. Is this about right? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
swap on raid
Hi, I need to use a raid volume for swap, utilizing partitions from 4 physical drives I have available. From my experience I have three options - raid5, raid10 with 2 offset chunks, and two raid 1 volumes that are swapon-ed with equal priority. However I have a hard time figuring out what to use as I am not really sure how can I detect the usage patterns of swap, left alone benchmark it. Has anyone done anything like this, or is there information on what kind of reads/writes the kernel performs when paging in and out? Before you answer my question - yes, I am painfully aware of the paradigm swap on raid is bad, and I know there are other ways to solve it, but my situation requires me to have swap. Several weeks ago a drive failed and took a full partition away bringing the system to its knees and causied massive data corruption. I am also aware that I can use a file that will reside alongside my other data, but fragmentation makes this approach inefficient. So I am looking into placing the swap directly on a raid voulme. Thanks Peter - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: swap on raid
The fact that you mention you are using partitions on disks that possibly have other partions doing other things, means raw performance will be compromised anyway. Regards, Richard You know I never thought about it, but you are absolutely right. The times at which my memory usage peaks coincide with high disk activity (mostly reads). In this light it actually might be better to keep the swap in a file on my raid10 (-p n3) which occupies most of these 4 drives, and hope that the md code will be able to distribute the io across idle drives. Does this sound about right? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID0 to RAID5 upgrade
On Thu, Mar 01, 2007 at 06:12:32PM -0500, Bill Davidsen wrote: I have three drives, with some various partitions, currently set up like this. drive0drive1drive2 hdb1 hdi1 hdk1 \_RAID1/ hdb2 hdi2 hdk2 unused \___RAID0/ 200GB 100GB x 2 hdi3 hdk3 \___unused___/ 100GB x 2 What I want to have is 3 x 200 = 400GB RAID5. I would like to avoid copying 200GB of data to another machine and back Can't you do the following: * copy the data from raid0 to hdb2 ( raid0 = hdb2 you can even do a dd) * degrade raid1 to only contain drive0 * since you have all your data on drive0, wipe drive1 and drive2 clean, create a degraded raid5 * copy stuff from drive0 to the new array (enw fs as well I presume) * resync the raid5 with drive0 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RAID10 Resync fails with specific chunk size and drive sizes (reproducible)
Hi, I think I've hit a reproducible bug in the raid 10 driver, tried on two different machines with kernels 2.6.20 and 2.6.18. This is a script to simulate the problem: == #!/bin/bash modprobe loop for ID in 1 2 3 ; do echo -n Creating loopback device $ID... dd if=/dev/zero of=dsk${ID}.img bs=512 count=995967 losetup /dev/loop${ID} dsk${ID}.img echo done. done mdadm -C /dev/md2 -l 10 -n 3 -p o2 -c 2048 /dev/loop1 /dev/loop2 /dev/loop3 echo Raid device assembled, check /proc/mdstat's output when resync is finished == This is the output I get in /proc/mdstat after the resync settles: == md2 : active raid10 loop3[2] loop2[3](F) loop1[0] 746496 blocks 2048K chunks 2 offset-copies [3/2] [U_U] == - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID10 Resync fails with specific chunk size and drive sizes (reproducible)
After I sent the message I received the 6 patches from Neil Brown. I applied the first one (Fix Raid10 recovery problem) and it seems to be taking care of the issue I am describing. Probably due to the rounding fixes. Thanks - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html