Re: raid1 with 1.2 superblock never marked healthy?
On 2006-02-20 at 09:30:22, Neil Brown wrote: If you use an 'internal' bitmap (which is mirrored across all drives much like the superblock) then you don't need to specify a file name. However if you want the bitmap on a separate file, you have to have that name 'hard coded' in mdadm.conf (or similar). I was under the impression that mdadm.conf is not extended for this :) That would be a nice place compared to /etc/rc.d/whatever.. I was considering using external bitmap only because I've been bitten a few times with journaling filesystems vs. cheap hard disks. I'm not sure if a few hard disks are a significant sample, but some of them started developing bad sectors where the journal is stored. I hoped that having an external bitmap would reduce the wear on the mirrored parts. This way, if the bitmap (even on the same hard disk) is getting flawed, probably both of the whole mirrors are intact enough for a last (additional) backup. I remember stopping/starting the array correctly does a resync again, even without a reboot. Hmm... it seems to work for me... How exactly to you start it again. Oops, I did not mean resync, but that spare confusion stuff. When I do mdadm -S /dev/md0, and then mdadm -A /dev/md0, I get: raid1: raid set md0 active with 1 out of 2 mirrors And I have to -r /dev/hda3, -a /dev/hda3, and that results in another resync. No. mdadm does not record the name of the bitmap file in the superblock. Just like it does not record the names of component devices in the superblock. Would it be a bad idea (apart from someone having to do the work :)? (But probably a bit better would be doing it as the jfs/ext3 external journals store uuid connecting the journal with the device itself). Array State : uu 1 failed Something is definitely wrong here... hda3 looks like a spare, but isn't I'll have a look and see what I can find out. The only unusual thing is how it got set up, because on a semi-live system, I started with the magic missing component to create another half mirror while the previous one is running. Unusual because I never thought of it as a bad idea, but maybe somehow it did cause what I'm seeing. The original command (then, trying to use bitmaps :) was: # mdadm --create /dev/md1 --level 1 -n 2 -d 4 -e 1.2 \ -b /etc/md/test1.bin --bitmap-chunk 64 missing /dev/hdc3 At some later time, I added /dev/hda3, which is the troubling spare now. Janos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NVRAM support
Hello, We have applications were large data sets (e.g. 100 MB) are sequentially written. Software RAID could do a full stripe update (without reading/using existing data). Does this happen in parallel? If yes, isn't that data vulnerable when a crash occurs? Thanks, Mirko Neil Brown schrieb: On Wednesday February 15, [EMAIL PROTECTED] wrote: Hi, My intention was not to use a NVRAM device for swap. Enterprise storage systems use NVRAM for better data protection/faster recovery in case of a crash. Modern CPUs can do RAID calculation very fast. But Linux RAID is vulnerable when a crash during a write operation occurs. E.g. Data and parity write requests are issued in parallel but only one finishes. This will lead to inconsistent data. It will be undetected and can not be repaired. Right? Wrong. Well, maybe 5% right. If the array is degraded, that the inconsistency cannot be detected. If the array is fully functioning, then any inconsistency will be corrected by a 'resync'. How can journaling be implemented within linux-raid? With a fair bit of work. :-) I have seen a paper that tries this in cooperation with a file system: ?Journal-guided Resynchronization for Software RAID? www.cs.wisc.edu/adsl/Publications This is using the ext3 journal to make the 'resync' (mentioned above) faster. Write-intent bitmaps can achieve similar speedups with different costs. But I would rather see a solution within md so that other file systems or LVM can be used on top of md. Currently there is no solution to the crash while writing and degraded on restart means possible silent data corruption problem. However is it, in reality, a very small problem (unless you regularly run with a degraded array - don't do that). The only practical fix at the filesystem level is, as you suggest, journalling to NVRAM. There is work underway to restructure md/raid5 to be able to off-load the xor and raid6 calculations to dedicated hardware. This restructure would also make it a lot easier to journal raid5 updates thus closing this hole (and also improving write latency). NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid1 with 1.2 superblock never marked healthy?
Hello Neil All , On Mon, 20 Feb 2006, Janos Farkas wrote: On 2006-02-20 at 09:30:22, Neil Brown wrote: If you use an 'internal' bitmap (which is mirrored across all drives much like the superblock) then you don't need to specify a file name. However if you want the bitmap on a separate file, you have to have that name 'hard coded' in mdadm.conf (or similar). I was under the impression that mdadm.conf is not extended for this :) That would be a nice place compared to /etc/rc.d/whatever.. I was considering using external bitmap only because I've been bitten a few times with journaling filesystems vs. cheap hard disks. I'm not sure if a few hard disks are a significant sample, but some of them started developing bad sectors where the journal is stored. I hoped that having an external bitmap would reduce the wear on the mirrored parts. This way, if the bitmap (even on the same hard disk) is getting flawed, probably both of the whole mirrors are intact enough for a last (additional) backup. ...snip... How hard would it be for mdadm md to allow use of both internal and external bitmaps ? And then mdadm be extended to do a compare of an external against a internal when the opertaor askes to do so . Soes this sound reasonable ? Thoughts ? This is probably an edge case I guess . But it might save someones bacon if this functionality was available . Tia , JimL -- +--+ | James W. Laferriere | SystemTechniques | Give me VMS | | NetworkEngineer | 3542 Broken Yoke Dr. | Give me Linux | | [EMAIL PROTECTED] | Billings , MT. 59105 | only on AXP | | http://www.asteriskhelpdesk.com/cgi-bin/astlance/r.cgi?babydr | +--+ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Avoiding resync of RAID1 during creation
With FC2, when installing a fresh new system we would create a RAID10 array by creating several RAID1s, then adding all of those to a RAID0 array. To make the RAID1 devices, we'd use the command: /sbin/mkraid --really-force --dangerous-no-resync /dev/mdX Then we'd set up the RAID0 and mke2fs our filesystems on top of it. This worked well for us, never had any problems later. As soon as the kickstart was finished, the system was ready to go. Now with FC4, raidtools is gone and I'm left with mdadm tools. As far as I can tell, mdadm has nothing resembling --dangerous-no-resync. I've updated my kickstart to use mdadm instead of mkraid using: /sbin/mdadm --create /dev/md4 --force --run --level=1 --chunk=256 \ --raid-disks=2 --spare-devices=0 /dev/sda5 /dev/sde5 This causes all of the newly created RAID1 devices to start syncing. On a system with many large disks and RAID1 arrays, syncing takes a considerably long time. Is there any way to avoid the sync after creation when using mdadm like I could with mkraid? The compelling argument I've read in the archives indicates this would run counter to ensuring both partitions were completely clean at a block level. I would think creation of the filesystem on top of the array would ensure they're clean, at least on that level for all intents and purposes. --bryan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: block level vs. file level
it wrote: Ouch. How does hardware raid deal with this? Does it? Hardware RAID controllers deal with this by rounding the size of participant devices down to nearest GB, on the assumption that no drive manufacturers would have the guts to actually sell eg. a 250 GB drive with less than exactly 250.000.000.000 bytes of space on it. (It would be nice if the various flavors of Linux fdisk had an option to do this. It would be very nice if anaconda had an option to do this.) - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Avoiding resync of RAID1 during creation
Tuomas Leikola wrote: mdadm --assume-clean from the man page: It can also be used when creating a RAID1 or RAID10 if you want to avoid the initial resync, however this practice - while normally safe - is not recommended. What version of mdadm was that from? From mdadm(8) in mdadm-1.11.0-4.fc4 on my systems: --assume-clean Tell mdadm that the array pre-existed and is known to be clean. This is only really useful for Building RAID1 array. Only use this if you really know what you are doing. This is currently only supported for --build. I tried with --assume-clean, it still wanted to sync. From what my man page was telling me, it only works with --build. If I use --build it'll go ahead without syncing, but I need per-device superblocks. Why mdadm didn't error when I used --assume-clean with --create, I don't know. --bryan - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Avoiding resync of RAID1 during creation
On 2/20/06, Bryan Wann [EMAIL PROTECTED] wrote: mdadm --assume-clean What version of mdadm was that from? From mdadm(8) in mdadm-1.11.0-4.fc4 on my systems: cut I tried with --assume-clean, it still wanted to sync. The man page i quoted was from 2.3.1 (6 feb) - relatively new. I tested this with 2 boxes: 1.9.0 starts the resync and 2.3.1 doesn't. Used kernel 2.6.14 - altough I don't expect that to make much of a difference. -tuomas - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NVRAM support
On Monday February 20, [EMAIL PROTECTED] wrote: Hello, We have applications were large data sets (e.g. 100 MB) are sequentially written. Software RAID could do a full stripe update (without reading/using existing data). Does this happen in parallel? If yes, isn't that data vulnerable when a crash occurs? md/raid5 does full stripe writes about 80% of the time when I've measured it while doing large writes. I'm don't know why it is not closer to 100%. I suspect some subtle scheduling issue that I haven't managed to get to the bottom of yet (I should get back to that). Data is only vulnerable if, after the crash, the array is degraded. If the array is still complete after the crash, then there is no loss of data. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bigendian issue with mdadm
On Tue, Feb 21, 2006 at 10:44:22AM +1100, Neil Brown wrote: On Monday February 20, [EMAIL PROTECTED] wrote: Hi All, Please, Help ! I've created a raid5 array on a x86 platform, and now wish to use it on a mac mini (g4 based). But the problem is : the first is little-endian, the second big-endian... And it seams like md superblock disk format is hostendian, so how should I say mdadm to use a endianness ? Read the man page several times? Look for --update=byteorder You need mdadm-2.0 or later. besides IIRC version 1 super block is always little-endan. L. -- Luca Berra -- [EMAIL PROTECTED] Communication Media Services S.r.l. /\ \ / ASCII RIBBON CAMPAIGN XAGAINST HTML MAIL / \ - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bigendian issue with mdadm
On Tuesday February 21, [EMAIL PROTECTED] wrote: On Tue, Feb 21, 2006 at 10:44:22AM +1100, Neil Brown wrote: On Monday February 20, [EMAIL PROTECTED] wrote: Hi All, Please, Help ! I've created a raid5 array on a x86 platform, and now wish to use it on a mac mini (g4 based). But the problem is : the first is little-endian, the second big-endian... And it seams like md superblock disk format is hostendian, so how should I say mdadm to use a endianness ? Read the man page several times? Look for --update=byteorder You need mdadm-2.0 or later. besides IIRC version 1 super block is always little-endan. True. v1 is little-endian, not host-endian so this issue won't appear if using v1 metadata. However the default is 0.90, and I'm still finding occasional bugs with the v1 code, so I'm not likely to change the default soon any time soon... probably not for 1 year after I'm as confident of v1 code as of v0.90. NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html