Re: /bsd raid0 Error re-writing parity!
[please CC me on any replies, as I'm not on m...@openbsd.org] Siju George writes: Hi, I am not able to re write the parity for my Raid set. # raidctl -Sv raid0 raid0 Status: Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. raidctl: ioctl () failed # raidctl -sv raid0 raid0 Components: /dev/wd0d: failed /dev/wd1d: optimal You cannot rebuild parity in this case because one of your disks has failed (parity re-writing can only happen if all the disks are 'good'). If you wish to attempt to rebuild that disk, you can do so with: raidctl -vR /dev/wd0d raid0 to do a reconstruct-in-place. Of course, finding out the real reason for why that disk has failed should also be on the top of your TODO list :) (e.g. if the disk is having physical read/write errors, you proably want to replace it before doing the rebuild. ) Later... Greg Oster
Re: RaidFrame woes on 4.2 (RAIDFRAME: failed rf_ConfigureDisks with 2)
knitti writes: Hi, I tried to set up a RAID 1 softraid with raidframe, but no matter what I try, the RAID refuses to configure. So please, if anyone has an idea what I may have missed... # raidctl -C raid0.conf raid0 raidctl: ioctl (RAIDFRAME_CONFIGURE) failed this adds the following lines to the dmesg buffer: raidlookup on device: /dev/wd3d failed ! ^ I suspect you have an extra space after wd3d in the config file... And, unfortunately, that annoying little non-feature is enough to stop RAIDframe in its tracks... :( (A fix for the issue is here: http://cvsweb.netbsd.org/bsdweb.cgi/src/sbin/raidctl/rf_configure.c.diff?r1=1.19r2=1.20 ) Otherwise what you have is just fine.. Later... Greg Oster
Re: RAID1 powerloss - can parity rewrite be safely backgrounded?
Matt writes: As for the suggestion of hardware raid - unfortunately this is a live server. If I migrate it to another machine I will definitely try hardware raid I know it is a lot faster Really? :) There is no guarantee that a hardware RAID is faster than a software RAID, or vice-versa. There is also no guarantee that a commercial software RAID solution is faster than RAIDframe... ;) Hardware RAID is just software RAID on a card. And so whether a hardware implmentation of software RAID is faster or slower than a traditional software RAID just depends on where the bottlenecks have been moved to :) Filesystems, data mixes, and underlying hardware will still all be important parts... but would that solve the parity problem on boot completely? 'man bio' doesn't seem to answer that. It depends on how the hardware RAID card keeps track of what parity bits are up-to-date :) If you don't have a good battery in the thing, then you might just be in the same boat as you are with RAIDframe (but because it's all hidden, you might not know it!). Don't let the idea that because it's hardware RAID it's automatically better lull you into a false sense of security understand the features and benefits of both, do the analysis, and pick the one that will work best for you. Later... Greg Oster
Re: RAID1 powerloss - can parity rewrite be safely backgrounded?
Brian A. Seklecki writes: raid(4) hasn't been touched in a while (years), so short answer: No. NetBSD is still actively committing to it, though, and has functional background parity recalculation. Just to be clear here: the background parity checking in NetBSD as of today is functionally the same as what OpenBSD has right now. The implications here are as follows: if the parity is checked in the background, and a non-parity component should fail, there is a very low, but non-zero probability of data loss. The longer it takes to check (and correct, if necessary) the parity, the greater the chance of loss. The value of your data should dictate whether you can live with that increase in risk. For the record, I do the parity checking in the background on all the machines I look after. Since most of them can complete the check in under an hour, there is that one hour window where some fragments of corruption *may* have occurred (and that didn't get caught with a filesystem check). I understand there is interest in replacing RAIDFrame instead of resynchronizing the subtree. In the mean time, find a hardware RAID Controller that can be managed by OpenBSD via bio(4) and grab a UPS that works with upsd(8). I worry more about a hardware RAID card forgetting its configuration after a power outage than I do about parity checking in the background :) (What do you mean these 14 disks in this 2TB hardware RAID array are now all 'unassigned'!?!?!?!. That wasn't a fun day.) Later... Greg Oster On Thu, 27 Sep 2007, Rob wrote: On 9/25/07, Matt [EMAIL PROTECTED] wrote: I'm running a RAID1 mirror on OpenBSD 4.1 (webserver) On a power failure the parity becomes dirty and needs rewriting, which results in 1.5 hours 'downtime'. Is it safe to background this in /etc/rc or is that a no-no? I found a reference this was possible/safe on-list but it was a) 2003 and b) dealt with RAID5. I'd like to make sure I am not doing something dangerous. I frankly don't know enough to guarantee that this is safe, or not, but I had a RAID1 with big disks on an ancient machine that took about 26 hours to check parity (! -- this wasn't my idea), and I modified its rc to boot up, and then begin performing the parity check in the background. The only caveat I would give is that the operating system was installed and running on a 3rd, separate disk, and that network access to the mirrored drives was disabled until the parity rewrite was complete. - R.
Re: Seeking info for RAID 1 on OpenBSD
L. V. Lammert writes: On Fri, 3 Aug 2007, Joel Knight wrote: --- Quoting HDC on 2007/08/02 at 20:26 -0300: Read this... http://www.packetmischief.ca/openbsd/doc/raidadmin/http://www.packetmisc hief .ca/openbsd/ I used to use raidframe and followed the procedures in that doc for doing so, but now there's no point. If the system requires any type of raid, go hardware. Long live bio(4). IF you choose to NOT use a h/w controller, use rsync instead. Permits quick recovery in the case of a drive failure (swap drive cables reboot), does not require lengthy parity rebuild. And you only lose the data written since the last rsync... and your system probably goes down instead of staying up until you can fix it.. RAIDframe, like hardware RAID and rsync, is just another tool. Understand the pros and cons of each, but be willing to accept the risks associated with whatever you choose... (if you think hardware RAID is riskless, then you've never had a 2TB RAID set suddenly decide that all components were offline and mark them as such!) For the folks who dislike the long parity checks... If you're willing to accept a window during which some of your data *might* be at risk, change: raidctl -P all to something like sleep 3600 ; raidctl -P all in /etc/rc . This will, of course, delay the start of the parity computation for an hour or so, giving your system a chance to do the fscks and get back to multi-user as quickly as possible. The risk here is as follows (this is for RAID 1.. risks for RAID 5 are slightly higher): 1) even though parity is marked 'dirty', it might actually be in sync. In this case if you have a component failure, your data is fine. 2) until the parity check is done, only the 'master' component is used for reading. But any writes will be done are mirrored to both components. That means that when the fsck is being done, any problems found will be fixed on *both* components, and writes will keep the two in sync even before parity is checked. 3) Where the risk of data loss comes in is if the master dies before the parity check gets done. In this case, data on the master that was not re-written or that was out-of-sync with the slave will be lost. This could result in the loss of pretty much anything. The important thing here is for you to evaluate your situation and decide whether this level of risk is acceptable... For me, I use the equivalent to 'sleep 3600' on my home desktop.. and slightly modified versions of it on other home servers and other boxen I look after.. But don't blindly listen to me or anyone else -- learn what the risks are for your situation, determine what level of risk you can accept, and go from there... Later... Greg Oster
Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64
Siju George writes: On 3/8/07, Greg Oster [EMAIL PROTECTED] wrote: Siju George writes: In my dmesg at one point it says == Kernelized RAIDframe activated dkcsum: wd0 matches BIOS drive 0x80 dkcsum: wd1 matches BIOS drive 0x81 root on wd0a So this gets printed from autoconf.c but it *shouldn't* since boothowto |= RB_DFLTROOT; in rf_openbsdkintf.c should cause the setroot() function to bail before printing the above So for some reason it's not calling the appropriate bits in rf_buildroothack() in rf_openbsdkintf.c But exactly why, I have no idea... [snip] Could you please shed any light on why my root device is not raid0 but wda0 still? No idea right now.. if you build a kernel with RAIDDEBUG defined and send the dmesg from that, I might be able to provide additional info... alright thankyou :-) here is it. hope it will help you see more into the issue :-) [snip] Kernelized RAIDframe activated Searching for raid components... dkcsum: wd0 matches BIOS drive 0x80 dkcsum: wd1 matches BIOS drive 0x81 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302 RAIDFRAME: protectedSectors is 64. raid0: Component /dev/wd0d being configured at row: 0 col: 0 Row: 0 Column: 0 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 844 Clean: Yes Status: 0 raid0: Component /dev/wd1d being configured at row: 0 col: 1 Row: 0 Column: 1 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 844 Clean: Yes Status: 0 RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep limit. raid0 (root) # So this is still not the output I'd expect what does 'disklabel wd0' and 'disklabel wd1' say? Are wd0d and wd1d of type FS_RAID ?? You should be seeing a Component on wd0d and then the full component label, and that should be printed before the dkcsum bits... It's still almost as though RAID_AUTOCONFIG isn't defined... (but it is, since the Searching... line above is printed...) Later... Greg Oster
Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64
Siju George writes: On 3/8/07, Greg Oster [EMAIL PROTECTED] wrote: [snip] Kernelized RAIDframe activated Searching for raid components... dkcsum: wd0 matches BIOS drive 0x80 dkcsum: wd1 matches BIOS drive 0x81 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302 RAIDFRAME: protectedSectors is 64. raid0: Component /dev/wd0d being configured at row: 0 col: 0 Row: 0 Column: 0 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 844 Clean: Yes Status: 0 raid0: Component /dev/wd1d being configured at row: 0 col: 1 Row: 0 Column: 1 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 844 Clean: Yes Status: 0 RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep lim it. raid0 (root) # So this is still not the output I'd expect what does 'disklabel wd0' and 'disklabel wd1' say? Are wd0d and wd1d of type FS_RAID ?? nope :-( So that is the reason right? Yes. is there any hope of fixing it now? It should just work to change 4.2BSD to RAID... as long as you're never actually mounting /dev/wd0d or /dev/wd1d anywhere it'll be fine... Will the raid be functioning right actually? Do you want me to recreate it with FS_RAID? You should only need to tweak the disklabel. If you boot single-user you should see root on /dev/raid0a .. at that point you can mount / read-write and fix /etc/fstab if necessary. You shouldn't need to rebuild the RAID set... == [snip] = You should be seeing a Component on wd0d and then the full component label, and that should be printed before the dkcsum bits... It's still almost as though RAID_AUTOCONFIG isn't defined... (but it is, since the Searching... line above is printed...) RAID_AUTOCONFIG is defined but for that to work the FS type shoud be FS_RAID right? Yes... if it's not FS_RAID, then for i386/amd64/(and others) it won't even consider the partition for autoconfig... Do you think this setup is bad actually? Nope... just needs a disklabel change and it should work... Later... Greg Oster
Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64
Siju George writes: On 3/6/07, Greg Oster [EMAIL PROTECTED] wrote: Siju George writes: It's working just fine... just probably telling you a bit more than you really wanted to know :) Later... Greg, Seeing that you work on RAIDFRAME let me dare to ask you one more thing :-) In my dmesg at one point it says == Kernelized RAIDframe activated dkcsum: wd0 matches BIOS drive 0x80 dkcsum: wd1 matches BIOS drive 0x81 root on wd0a Shouldn't the root be on raid0a ? I don't know what OpenBSD is in with respect to root-on-RAID. Since the dmesg again shows === raid0 (root) === and raidctl shows = Autoconfig: Yes Root partition: Yes for both drives? what does 'mount' say for '/'? RAIDframe used to do a bit of 'hijacking' of the boot disk in order to get itself in as /, but I don't know the details in OpenBSD... [snip] Could you please shed any light on why my root device is not raid0 but wda0 still? That part is specific to OpenBSD... Thankyou so much ( especially for the simple make file introduction on your website ) kind Regards Siju For my Full dmesg OpenBSD 4.0 (GENERIC.RAID2) #0: Fri Nov 24 20:28:14 IST 2006 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC. RAID2 real mem = 1039593472 (1015228K) avail mem = 878211072 (857628K) using 22937 buffers containing 104165376 bytes (101724K) of memory mainbus0 (root) bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xfc650 (54 entries) bios0: Acer Aspire Series cpu0 at mainbus0: (uniprocessor) cpu0: AMD Athlon(tm) 64 Processor 3400+, 2193.94 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36 ,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 16-way L2 cache cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative pci0 at mainbus0 bus 0: configuration mode 1 pchb0 at pci0 dev 0 function 0 ATI RS480 Host rev 0x10 ppb0 at pci0 dev 1 function 0 ATI RS480 PCIE rev 0x00 pci1 at ppb0 bus 1 vga1 at pci1 dev 5 function 0 ATI Radeon XPRESS 200 rev 0x00 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) pciide0 at pci0 dev 17 function 0 ATI IXP400 SATA rev 0x80: DMA pciide0: using irq 11 for native-PCI interrupt pciide0: port 0: device present, speed: 1.5Gb/s wd0 at pciide0 channel 0 drive 0: ST3120827AS wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors wd0(pciide0:0:0): using BIOS timings, Ultra-DMA mode 6 pciide0: port 1: device present, speed: 1.5Gb/s wd1 at pciide0 channel 1 drive 0: ST3120827AS wd1: 16-sector PIO, LBA48, 114473MB, 234441648 sectors wd1(pciide0:1:0): using BIOS timings, Ultra-DMA mode 6 pciide1 at pci0 dev 18 function 0 ATI IXP400 SATA rev 0x80: DMA pciide1: using irq 5 for native-PCI interrupt ohci0 at pci0 dev 19 function 0 ATI IXP400 USB rev 0x80: irq 4, version 1.0, legacy support usb0 at ohci0: USB revision 1.0 uhub0 at usb0 uhub0: ATI OHCI root hub, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1 at pci0 dev 19 function 1 ATI IXP400 USB rev 0x80: irq 4, version 1.0, legacy support usb1 at ohci1: USB revision 1.0 uhub1 at usb1 uhub1: ATI OHCI root hub, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered ehci0 at pci0 dev 19 function 2 ATI IXP400 USB2 rev 0x80: irq 4 usb2 at ehci0: USB revision 2.0 uhub2 at usb2 uhub2: ATI EHCI root hub, rev 2.00/1.00, addr 1 uhub2: 8 ports with 8 removable, self powered piixpm0 at pci0 dev 20 function 0 ATI IXP400 SMBus rev 0x81: SMI iic0 at piixpm0 unknown at iic0 addr 0x2f not configured pciide2 at pci0 dev 20 function 1 ATI IXP400 IDE rev 0x80: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility azalia0 at pci0 dev 20 function 2 ATI IXP450 HD Audio rev 0x01: irq 5 azalia0: host: High Definition Audio rev. 1.0 azalia0: codec: Realtek ALC880 (rev. 8.0), HDA version 1.0 audio0 at azalia0 pcib0 at pci0 dev 20 function 3 ATI IXP400 ISA rev 0x80 ppb1 at pci0 dev 20 function 4 ATI IXP400 PCI rev 0x80 pci2 at ppb1 bus 2 re0 at pci2 dev 3 function 0 Realtek 8169 rev 0x10: irq 5, address 00:16:17:20:2a:a6 rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2 pchb1 at pci0 dev 24 function 0 AMD AMD64 HyperTransport rev 0x00 pchb2 at pci0 dev 24 function 1 AMD AMD64 Address Map rev 0x00 pchb3 at pci0 dev 24 function 2 AMD AMD64 DRAM Cfg rev 0x00 pchb4 at pci0 dev 24 function 3 AMD AMD64 Misc
Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64
Siju George writes: On 3/6/07, Greg Oster [EMAIL PROTECTED] wrote: Siju George writes: It's working just fine... just probably telling you a bit more than you really wanted to know :) Later... Greg, Seeing that you work on RAIDFRAME let me dare to ask you one more thing :-) Bah... I hit the Send button on that last email sooner than I wanted to :( In my dmesg at one point it says == Kernelized RAIDframe activated dkcsum: wd0 matches BIOS drive 0x80 dkcsum: wd1 matches BIOS drive 0x81 root on wd0a So this gets printed from autoconf.c but it *shouldn't* since boothowto |= RB_DFLTROOT; in rf_openbsdkintf.c should cause the setroot() function to bail before printing the above So for some reason it's not calling the appropriate bits in rf_buildroothack() in rf_openbsdkintf.c But exactly why, I have no idea... [snip] Could you please shed any light on why my root device is not raid0 but wda0 still? No idea right now.. if you build a kernel with RAIDDEBUG defined and send the dmesg from that, I might be able to provide additional info... Thankyou so much ( especially for the simple make file introduction on your website ) :) Later... Greg Oster
Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64
Siju George writes: Hi, The dmesg Output Shows Clean: Yes for both Raid Components as shown below raid0: Component /dev/wd0d being configured at row: 0 col: 0 Row: 0 Column: 0 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 820 Clean: Yes Status: 0 raid0: Component /dev/wd1d being configured at row: 0 col: 1 Row: 0 Column: 1 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 200612010 Mod Counter: 820 Clean: Yes Status: 0 raid0 (root) = but raidctl shows Clean: No as shown below Could Some one tell me why this is so? it is the same state even after reboots. The value Yes or No comes directly from the component labels on the disks. If the parity is known good (i.e. the set is clean) when the RAID sets are unconfigured (actually, when the last open partition is unmounted), then the value in the component labels will be set to Yes. When a RAID set is configured and a partition is opened/mounted, the value is the component labels will be set to no. And so unless things get unmounted/unconfigured correctly, the value will remain at no until the parity gets checked. What you are seeing here is: a) the values reported by dmesg are from *before* any partitions on raid0 get opened. So if the RAID set was known clean, you'll see a value of Yes printed for each component, because that's what they got set to at the last shutdown/unmount/unconfigure/etc. b) the values reported by raidctl are from *after* a partition on raid0 has been opened (even 'raidctl -vs raid0' ends up opening /dev/raid0c or whatever, resulting in that clean flag being changed from Yes to No). So it will always say No here, since that will be the current value in the component labels. Which one should i beleive? Both of them :) They are both correct for the time at which they are examining the datapoint in question. That said, the line to really care about is this one: Parity status: clean Is the Raid not working properly? It's working just fine... just probably telling you a bit more than you really wanted to know :) Later... Greg Oster
Re: Raidframe parity problems
Julian Labuschagne writes: Then I had to test the server before putting it into a production enviroment. So I switched of /dev/wd3a. So at this point wd3a will get marked as failed... The system halted itself when I did that... oops. So it wasn't a clean shutdown, and so the parity bits won't have been marked as clean I started the system started and gave me the following error: raid0: Error re-writing parity. Right. When I run the command: raidctl -s raid0 raid0 Components: /dev/wd1a: optimal /dev/wd2a: optimal /dev/wd3a: failed Spares: /dev/wd4a: spare Parity status: DIRTY Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete I have tried running the following command: raidctl -P raid0 raid0: Parity status: Dirty raid0: Initiating re-write of parity raid0: Error re-writing parity! I'm not sure what is going on here my kernel is standard except for raidframe support compiled in. I just can't seem to rebuild the array. Anybody run into this problem before? Any help would be appreciated. Everything is behaving normally. The system can't make sure the parity is up-to-date because it is missing a component. It is a bit of a mis-nomer to call the parity DIRTY at this point because, well, that's the only information you have to go on, and it's as good as it's going to get. But it's been left the way it is so that one can tell if the parity is known good or known questionable... In any event, 'raidctl -P' isn't going to do anything useful until you get wd3a (or its replacment) added back into the array Later... Greg Oster
Re: RAIDFrame parity rebuild: why so slow?
Jeff Quast writes: On 10/3/06, Joerg Zinke [EMAIL PROTECTED] wrote: On Mon, 02 Oct 2006 20:11:36 +0200 nothingness [EMAIL PROTECTED] wrote: Hi all, I've been using RAIDFrame on OpenBSD since 3.1 and in 4 years I've never seen any performance improvement in getting the system to work any faster at rebuilding parity after a hard shutdown. I've tried RAID1, RAID5, SCSI drives, IDE drives, processors from PentiumII 400s to Athlon64 3200+ and it has *always* been ridiculously slow at rebuilding. Just a 9G RAID5 partition takes over 2 hours. A 60G RAID1 takes 11 hours. 11!!! Before flaming me to say, just go and edit the code, it's never been out of beta or whatever, explain why compared to other OSes it's always so slow, even to build the first time around. Linux's code in particular comes to mind. maybe this is one of the reasons why raidframe is not officially supported and not enabled in stable kernel. i think another reason is or that it doubles the size of a kernel for a function 5% of openbsd users use. RAIDframe on i386 archs used to be about 500K, which is ~10% of the current size of /bsd. In a certain other BSD, RAIDframe now weighs in at about 148K for i386. [snip] Raidframe was originaly a simulator. A simulator. It was never meant to be a kernel driver. It is not meant to ensure speed. It is not meant to actualy be used to store real data. RAIDframe was developed as a framework. It wasn't just a simulator. It wasn't just a user-land RAID driver. It wasn't just a kernel driver. It was built with all three to allow rapid prototyping of new types of RAID. Yes, there is some overhead to this, but it's not as large as the code size might suggest... (e.g. compare the performance difference between CCD vs RAID0..) Later... Greg Oster
Re: Replacing a failed HD in a raidframe array
Jason Murray writes: [snip] So according to the raidctl(8) once I add the new HD to the system I do a raidctl -a /dev/hd1d to add it as a spare, then do a raidctl -F component1 raid0 to force a rebuild. Then I would modify my /etc/raid0.conf to reflect my new device (which actually won't need modification in my case). First question: do I have this correct? Yes. (s/hd1d/wd1d/ , of course) Second question: if the rebuild fails at 48% with bad disk block errors does this mean that wd0 is bad? Most likely. (it could be the new disk too, but that's less likely) Later... Greg Oster
Re: RAIDframe Root on RAID -- configuring dump device
Josh Grosse writes: Has anyone using Root on RAID managed to point their dumpdev at a swap space, either within a RAID array or on a standard swap partition? Dumping to a standard swap partition on a RAID set is not supported. I have not, and a search of the archives only came up with one posting, with a similar question, but no answer: http://marc.theaimsgroup.com/?l=openbsd-miscm=111763609916743w=2 I'm running -current on i386, and have just successfully implemented RAID level 1 mirroring. I am using two Autoconfig devices: raid0 (ffs partitions) which is also set as the Root partition raid1 (swap). My kernel configuration is GENERIC plus RAIDframe, which means that my config line reads: configbsd swap generic When booting normally, with raid0a as root, I get this kernel message right before init starts: swapmount: no device and then during rc I get: savecore: no core dump I have tried modifying the config line. If I use: configbsd root on wd0a swap on wd0b then I do get an unmirrored partition as my swap_device, and it is also a dump device. Does the config syntax support dumps on wd0b? Dunno if you can use something like: config bsd root on wd0a swap on wd0b dumps on wd0b but that might be sufficient... But ... adding /dev/raid1b doesn't work -- adding this device to /etc/fstab seems to be ignored, and swapctl -a /dev/raid1b fails with file not found. raid1b is an unacceptable keyword for kernel config. Dunno about any of this I'd have suggested using 'swapctl -D /dev/wd0b', but I don't believe that'd work for you... Anyone with a successful swap/dump setup who might be able to point me to what I'm missing? You should be able to do it, but not to swap on a RAID set... Later... Greg Oster
Re: RAIDframe Root on RAID -- configuring dump device
Josh Grosse writes: On Tue, Aug 29, 2006 at 02:28:50PM -0600, Greg Oster wrote: Josh Grosse writes: Has anyone using Root on RAID managed to point their dumpdev at a swap sp ace, either within a RAID array or on a standard swap partition? Dumping to a standard swap partition on a RAID set is not supported. Could you clarify what you mean? As in, the raiddump() function returns ENXIO, and has a comment saying Not implemented.. In other words, ENOWORKIE :) I have a raid1b partition markes as swap, and a wd0b partition marked as swap, and I have not figured out how to get a dump device assigned, so far, unless I use swap on wd0b -- which is unmirrored. I have no problem with having an unprotected dump area, but I am concerned about using the partition as swap space. Right... If you're going to all the trouble of having a system on RAID, you really want swap on RAID too... but not dump... snip I have tried modifying the config line. If I use: configbsd root on wd0a swap on wd0b then I do get an unmirrored partition as my swap_device, and it is also a dump device. Does the config syntax support dumps on wd0b? Dunno if you can use something like: config bsd root on wd0a swap on wd0b dumps on wd0b but that might be sufficient... Since root on wd0a swap on wd0b assigns the dump device, isn't dumps on wd0b redundant? Or have I misunderstood you? Do you think an explicit assignment would change this behavior I'm not sure if OpenBSD even supports that syntax... (The OS which with I'm most familiar does support that syntax, and it allows you to have swap and dump space in different places, if that's what you want/ need... And yes, the explicit assignment, if you can do it, would change it... ) But ... adding /dev/raid1b doesn't work -- adding this device to /etc/fstab seems to be ignored, and swapctl -a /dev/raid1b fails wit h file not found. raid1b is an unacceptable keyword for kernel config. Dunno about any of this I'd have suggested using 'swapctl -D /dev/wd0b', but I don't believe that'd work for you... There is no -D in current. Is there an uncommitted swapctl.c in development? Dunno... swapctl in NetBSD has it, which is why I was going to suggest it, but it seems that swapctl in OpenBSD doesn't have it... Anyone with a successful swap/dump setup who might be able to point me to what I'm missing? You should be able to do it, but not to swap on a RAID set... I can swap on a RAID set just fine, but only if I leave the config line in GENERIC untouched. But if I do that, I have no dump device. I seem to be able to swap and dump to non-raid thru altering the config line, as I described. But if I do that, I cannot then add a RAID set to the swap list. Yuck... :-/ I'll have to defer to others more familiar with OpenBSD to comment on how to get around that little problem But I do know that dumping to a RAID device will not work at all. Later... Greg Oster
Re: raidctl on a live raid array, and the kernel debugger
Jason Murray writes: I've tried, again, to fix my raid array with raidctl -R. I did it on the console port this time so I could capture the output from ddb Here is some output: # raidctl -s raid0 raid0 Components: /dev/wd0d: failed /dev/wd1d: optimal No spares. Parity status: DIRTY Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. So I attempt an inplace reconstruction of wd0d. # # raidctl -R /dev/wd0d raid0 Closing the opened device: /dev/wd0d About to (re-)open the device for rebuilding: /dev/wd0d RECON: Initiating in-place reconstruction on row 0 col 0 - spare at row 0 col 0. Quiescence reached... I then use raidctl -S to monitor the reconstruction. Things go well until the 48% mark. Then I get: wd1d: uncorrectable data error reading fsbn 111722176 of 11722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retrying /wd1: transfer error, downgrading to Ultra-DMA mode 4 wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 4 wd1d: uncorrectable data error reading fsbn 111722176 of 111722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retrying wd1d: uncorrectable data error reading fsbn 111722248 of 111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1), retrying wd1d: uncorrectable data error reading fsbn 111722248 of 111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1) raid0: IO Error. Marking /dev/wd1d as failed. Recon read failed ! panic: RAIDframe error at line 1518 file /usr/src/sys/dev/raidframe/rf_reconstruct.c Stopped at Debugger+0x4: leave RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC! DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION! This concerns me because I need wd1d to rebuild my failed wd0d. Any ideas? Drive cables maybe? Any help is greatly appreciated. You have recent backups, right? wd1 is failing/dying. At this point you're probably better off in attempting to use 'dd' to recover as many bits as you can... (if you do a 'dump' of the filesystem you can probably tell from that whether or not there is any 'live data' in the portion that is unreadable if there isn't any live data, then you can use 'dd' to make as much of a copy as possible of wd1, and use that as the base for reconstructing the RAID set.) Later... Greg Oster
Re: raidctl on a live raid array, and the kernel debugger
Jeff Quast writes: My first few months with raidframe caused many kernel panics. With 30 minutes of parity checking, this was a difficult learning experience. I was initialy led to beleive that raidframe was hardly stable (and therfor disabled in GENERIC). However, as I gained experience with raidctl and raidframe, and traced the panics to code level, I almost always found the panics were caused by my misuse or misinterpretation of raidctl(8). A small book could probobly be written on the many different situations you can find yourself in with raidframe. I havn't had a kernel panic for a long time, and have had 3 disks fail since on a level 5 raid without issue reconstructing, changing geometry, etc. If memory serves me, I may have reconstructed a mounted raidset, though given the choice, I certainly wouldn't. RAIDframe was built to allow reconstructing a mounted RAID set... in fact, it goes to a lot of trouble to allow that to happen properly... The only 'problem' you might notice would be a performance degredation for both the rebuild and any user IO taking place... All in all, I find kernel panics with raidframe is just its way of saying Bad choice of arguments :) RAIDframe in OpenBSD is somewhat lax about checking the input provided by raidctl... It works quite well if you don't tell it to do anything it's not expecting :-} (most (all?) of those problems have long since been cleaned up -- unfortunately not in the code base that's in OpenBSD though :( ) Later... Greg Oster
Re: no raid reconstruction with autoconfigured sets
Walter Haidinger writes: Hi! Summary: raid set reconstruction fails with error rewriting parity for sets with non-root autoconfigure enabled, works when disabled. It seems as if there is a bug when reading the the component label. Details: I'm running a OpenBSD 3.9 GENERIC kernel with RAID enabled. That is, no other changes but the ones from raid(4): pseudo-device raid 4 optionRAID_AUTOCONFIG I'm running a raid1 mirror of both ide channel master devices. After a complete disk failure of wd1, I replaced the faulty drive and rebootet (came up in degraded mode on wd0 just fine) and did fdisk/disklabel to match wd0 layout which looks as in Auto-configuration and Root on RAID of raidctl(8): wd[01]a: minium openbsd install (/bsd is RAID capable kernel) wd[01]e: raid0 (raid0a is /) wd[01]f: raid1 (raid1b is swap) wd[01]g: raid2 (raid2d is /usr, raid2e is /var, ...) All raid sets are set to autoconfigure, raid0 as root autoconfigure. Then I tried to resync using the method from raidctl(8), bottom of Dealing with Component Failures, i.e.: # raidctl -a /dev/wd1e raid0 # raidctl -F component1 raid0 # raidctl -a /dev/wd1f raid1 # raidctl -F component1 raid1 # raidctl -a /dev/wd1g raid2 # raidctl -F component1 raid2 Only rebuilding root autoconfigued raid0 set succeeded. Non-root sets raid1 and raid2 failed with raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL). Adding a spare did work: # raidctl -a /dev/wd1g raid1 Isn't that the spare you used for raid2 ? # raidctl -vs raid1 raid1 Components: /dev/wd0f: optimal component1: failed Spares: /dev/wd1f: spare Oh.. but here it's correct.. Component label for /dev/wd0f: Row: 0, Column: 0, Num Rows: 1, Num Columns: 2 Version: 2, Serial Number: 298644, Mod Counter: 657 Clean: No, Status: 0 sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 Queue size: 100, blocksize: 512, numBlocks: 1024000 RAID Level: 1 Autoconfig: Yes Root partition: No Last configured as: raid1 component1 status is: failed. Skipping label. /dev/wd1f status is: spare. Skipping label. Parity status: DIRTY Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. However, failure and immediate reconstruction did not work: # raidctl -F component1 raid1 # raidctl -vs raid1 raid1 Components: /dev/wd0f: optimal component1: reconstructing Spares: /dev/wd1f: used_spare Component label for /dev/wd0f: Row: 0, Column: 0, Num Rows: 1, Num Columns: 2 Version: 2, Serial Number: 298644, Mod Counter: 658 Clean: No, Status: 0 sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1 Queue size: 100, blocksize: 512, numBlocks: 1024000 RAID Level: 1 Autoconfig: Yes Root partition: No Last configured as: raid1 component1 status is: reconstructing. Skipping label. raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL) failed Hmm.. where is the lines saying reconstruction is n% complete? (they arn't pretty, but in this case they'd be useful) raidctl -F subsequently fails with error rewriting parity. That error will only come from attempting to check parity on a RAID set with a failed component. It has nothing to do with raidctl -F. How long did you wait for the reconstruction to finish? For the above output, note that it still says reconstructing for component1... When that finishes, it will say spared. Later... Greg Oster
Re: no raid reconstruction with autoconfigured sets
Walter Haidinger writes: First of all: Thanks for replying to an issue with a non-generic kernel! I really appreciate that! That it was a non-generic kernel didn't even cross my mind... it was an issue w/ RAIDframe, and that's why I responded... On Thu, 29 Jun 2006, Greg Oster wrote: Adding a spare did work: # raidctl -a /dev/wd1g raid1 Isn't that the spare you used for raid2 ? Sorry, cutpaste error, should have been wd1f. Hmm.. where is the lines saying reconstruction is n% complete? (they arn't pretty, but in this case they'd be useful) I'm sorry, I did not record those. Reconstructing did take some time, though, I recall checking the progress, nothing suspicous there, So did the reconstruction actually complete? raidctl -F subsequently fails with error rewriting parity. That error will only come from attempting to check parity on a RAID set with a failed component. It has nothing to do with raidctl -F. Oh yes, of course! Should have mentioned that I've tried raidctl -P after raidctl -F ... Ok... so the big question is still: how far along was the reconstruction? raidctl -P would fail even if the reconstruct was still in progress. How long did you wait for the reconstruction to finish? For the above output, note that it still says reconstructing for component1... When that finishes, it will say spared. And what about the spare? Shouldn't it replace component1? It won't replace it in the output of 'raidctl -s', but it will replace component1 for all accesses and what-not.. (and will take its proper place (with autoconfig turned on) after a reboot (well... sans a known bug in rf_reconstruct.c where this line: c_label.partitionSize = raidPtr-Disks[srow][scol].partitionSize; should be added to where it says: /* MORE NEEDED HERE. */ ) That never happend. Instead, component1 sequence was: failed - reconstructing - failed. Hmm... I think you should see failed-reconstructing-spared (that's what you'd see if 'component1' was a normal disk...) You might want to check /var/log/messages* for some indication as to why the reconstruction failed... (as well, there should be something in there indicating the reconstruction completed, if it did...) Later... Greg Oster
Re: RAIDframe, swapping components in a RAID 1 array
Paul Wright writes: Hi all, I've followed a set of instructions[1] describing a method of installing OpenBSD onto a RAID 1 array created with raidctl using only 2 disks (sd0b + sd1b). The basic premise is to first install normally onto one disk (sd0b) and then created a degraded RAID 1 array using the second disk (sd1b) and a fake third disk (sd2b). After booting off the array you then add the original first (sd0b) disk to the array and rebuild. This works but the changes don't 'stick' between reboots, the array promptly forgets about sd0b: # raidctl -s raid0 raid0 Components: component0: failed /dev/sd1b: optimal No spares. Parity status: clean Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. # raidctl -a /dev/sd0b raid0 # raidctl -F component0 raid0 # raidctl -s raid0 raid0 Components: component0: spared /dev/sd1b: optimal Spares: /dev/sd0b: used_spare Parity status: clean Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. Try doing a: raidctl -i 605190 raid0 here before rebooting. I seem to recall a bug related to component labels on used spares not being updated properly after a reconstruct, and I think re-running the '-i' option was the workaround... Later... Greg Oster
Re: RAID label problem?
Xavier Mertens writes: Hi, I'm busy to set up a box with 2 x 80GB disks in RAID1. I'm following the procedures found online but, once the RAID is initialized, I got the follow error while trying to partition it: Write new label?: [y] disklabel: ioctl DIOCWDINFO: No space left on device disklabel: unable to write label The RAID is up, consistant: # raidctl -s raid0 raid0 Components: /dev/wd0d: optimal /dev/wd1d: optimal No spares. Parity status: clean Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete. disklabel report the following: # disklabel -E raid0 disklabel: Can't get bios geometry: Device not configured Initial label editor (enter '?' for help at any prompt) p device: /dev/rraid0c type: RAID disk: raid label: fictitious bytes/sector: 512 sectors/track: 128 tracks/cylinder: 8 sectors/cylinder: 1024 cylinders: 156417 total sectors: 160171392 free sectors: 160171392 rpm: 3600 16 partitions: # sizeoffset fstype [fsize bsize cpg] a: 4358414031416925149 unused 0 0 # Cyl 1383715*-180 9342* c: 160171392 0 unused 0 0 # Cyl 0 -15641 7* 435841403 + 1416925149 = 1852766552 which is greater than 160171392 by 1692595160. If you fix the offset of 'a', I suspect things will be happier. Later... Greg Oster
Re: RAID label problem?
Xavier Mertens writes: Well, I already tried to create only a small partition: [snip] p device: /dev/rraid0c type: RAID disk: raid label: fictitious bytes/sector: 512 sectors/track: 128 tracks/cylinder: 8 sectors/cylinder: 1024 cylinders: 156417 total sectors: 160171392 free sectors: 159761792 rpm: 3600 16 partitions: # sizeoffset fstype [fsize bsize cpg] a:409600 0 4.2BSD 2048 16384 16 # Cyl 0 - 39 9 c: 160171392 0 unused 0 0 # Cyl 0 -15641 7* q Write new label?: [y] y disklabel: ioctl DIOCWDINFO: No space left on device disklabel: unable to write label What does 'raidctl -s raid0' say? There are not may places in the DIOCWDINFO code path where ENOSPC is returned... but one of them is in raidstrategy(). Later... Greg Oster
Re: RAIDframe parity errors and rebuild
David Wilk writes: this was exactly my thought. I was hoping someone would have some 'official' knowledge, or opinion. I still can't get over having to wait several hours for my root partition to become available after an improper shutdown. On 3/18/06, Joachim Schipper [EMAIL PROTECTED] wrote: On Sat, Mar 18, 2006 at 12:59:30PM +0200, Antonios Anastasiadis wrote: I had the same question, and just changed the relevant line in /etc/rc adding '' in the end: raidctl -P all Then again, why is this not the default? Are you certain this actually works? Joachim If you want to be 100% paranoid, then you want to wait for the 'raidctl -P all' to update all parity before starting even fsck's. There *is* a non-zero chance that the parity might be out-of-sync with the data, and should a component die before that parity has been updated, then you could end up reading bad data. This can happen even if the filesystem has been checked. What are the odds of this happening? Pretty small. If 'raidctl -P all ' is run, then the larger problem is both fsck and raidctl will be fighting for disk cycles -- i.e. the fsck will take longer to complete. On more critical systems, this is how I typically have things setup (I'm willing to risk it that I'm not going to have a disk die during the minutes that it takes to do the fsck). On less critical boxes, I've got a sleep 3600 before the 'raidctl -P', so that the parity check doesn't get in the way of the fsck or the system coming up... about an hour after it comes up, the disks are then checked... It's one of those what are the odds games... allowing the raidctl to run in the background seems to have the right mix of paranoia and practicality... Later... Greg Oster
Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
Adam PAPAI writes: Hello misc, I have an IBM xSeries 335 machine with Dual Xeon processor and 2x73GB SCSI Seagate Barracuda 10K rpm disc. I run OpenBSD 3.8 on it. When I'm creating the raid array (raidctl -iv raid0), I get the following error message: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28 SENSE KEY: Media Error INFO: 0x224c10c (VALID flag on) ASC/ASCQ: Read Retries Exhausted SKSV: Actual Retry Count: 63 raid0: IO Error. Marking /dev/sd0d as failed. raid0: node (Rod) returned fail, rolling backward Unable to verify raid1 parity: can't read stripe. Could not verify parity. Is this early in the initialization or late in the initialization? Try doing: dd if=/dev/rsd0d of=/dev/null bs=10m and see if you get the same error message... Later... Greg Oster
Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
Adam PAPAI writes: Greg Oster wrote: Adam PAPAI writes: Hello misc, I have an IBM xSeries 335 machine with Dual Xeon processor and 2x73GB SCSI Seagate Barracuda 10K rpm disc. I run OpenBSD 3.8 on it. When I'm creating the raid array (raidctl -iv raid0), I get the following error message: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28 SENSE KEY: Media Error INFO: 0x224c10c (VALID flag on) ASC/ASCQ: Read Retries Exhausted SKSV: Actual Retry Count: 63 raid0: IO Error. Marking /dev/sd0d as failed. raid0: node (Rod) returned fail, rolling backward Unable to verify raid1 parity: can't read stripe. Could not verify parity. Is this early in the initialization or late in the initialization? Try doing: dd if=/dev/rsd0d of=/dev/null bs=10m and see if you get the same error message... # dd if=/dev/rsd0d of=/dev/null bs=10m 6977+1 records in 6977+1 records out 73160687104 bytes transferred in 1043.771 secs (70092636 bytes/sec) # dd if=/dev/rsd1d of=/dev/null bs=10m 6977+1 records in 6977+1 records out 73160687104 bytes transferred in 1027.051 secs (71233712 bytes/sec) # This means no hdd error.. Well... no hdd error for this set of reads... Hm What if you push both drives at the same time: dd if=/dev/rsd0d of=/dev/null bs=10m dd if=/dev/rsd1d of=/dev/null bs=10m ? (Were the drives warm when you did this test, and/or when the original media errors were reported? Does a 'raidctl -iv raid0' work now or does it still trigger an error? ) Then probably the raidFrame has the problem I guess.. RAIDframe doesn't know anything about SCSI controllers or SCSI errors... all it knows about are whatever VOP_STRATEGY() happens to return to it from the underlying driver... I have to use /altroot on /dev/sd1a then, or is there a patch for raidframe to fix this? There is no patch for RAIDframe to fix this. There is either a problem with the hardware (most likely), some sort of BIOS configuration issue (is it negotiating the right speed for the drive?), or (less likely) a mpt driver issue. Once you figure out what the real problem is and fix it, RAIDframe will work just fine :) Later... Greg Oster
Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
Adam PAPAI writes: After reboot my dmesg end: rootdev=0x400 rrootdev=0xd00 rawdev=0xd02 Hosed component: /dev/sd0d. raid0: Ignoring /dev/sd0d. raid0: Component /dev/sd1d being configured at row: 0 col: 1 Row: 0 Column: 1 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 100 Mod Counter: 27 Clean: No Status: 0 /dev/sd1d is not clean ! raid0 (root)raid0: no disk label raid0: Error re-writing parity! dd if=/dev/rsd0d of=/dev/null bs=10m dd if=/dev/rsd1d of=/dev/null bs=10m was successfully ended. # raidctl -iv raid0 wha does 'raidctl -s raid0' say? It probably says that 'sd0d' is failed. You can't initialize parity with 'raidctl -iv' on a set with a failed component. You can do 'raidctl -vR /dev/sd1d raid0' to get it to reconstruct back onto the failed component. After that you can do a 'raidctl -iv' (though by that point it's strictly not necessary). Later... Greg Oster
Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
Adam PAPAI writes: Greg Oster wrote: Adam PAPAI writes: After reboot my dmesg end: rootdev=0x400 rrootdev=0xd00 rawdev=0xd02 Hosed component: /dev/sd0d. raid0: Ignoring /dev/sd0d. raid0: Component /dev/sd1d being configured at row: 0 col: 1 Row: 0 Column: 1 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 100 Mod Counter: 27 Clean: No Status: 0 /dev/sd1d is not clean ! raid0 (root)raid0: no disk label raid0: Error re-writing parity! dd if=/dev/rsd0d of=/dev/null bs=10m dd if=/dev/rsd1d of=/dev/null bs=10m was successfully ended. # raidctl -iv raid0 wha does 'raidctl -s raid0' say? It probably says that 'sd0d' is failed. You can't initialize parity with 'raidctl -iv' on a set with a failed component. You can do 'raidctl -vR /dev/sd1d raid0' to get it to reconstruct back onto the failed component. After that you can do a 'raidctl -iv' (though by that point it's strictly not necessary). Interesting. I tried with 3 full reinstall and all raidctl -iv raid0 fails, but with raidctl -vR /dev/sd0d solved the problem. But why? It didn't solve the Media Error... the Media Error just didn't show up again. Will it be good from now? If I had to pick from one of Yes or No, I'd pick No. I'm fraid the raid will collapse again. I hope not. I going to continue the setup on my server. Thanks anyway. I hope I won't get more errors... I hope so too... but nothing in 'raidctl -vR' really fixes media errors... (Since 'raidctl -R' is going to write to sd0, it's possible that the drive has now re-mapped whatever bad block was on sd0, and sd0 may work fine now... but it's unusual to see the same error on 2 different drives... makes me maybe suspect cabling too..) Later... Greg Oster
Re: RAIDframe question
=?ISO-8859-1?Q?H=E5kan_Olsson?= writes: On 1 feb 2006, at 08.38, Jurjen Oskam wrote: On Wed, Feb 01, 2006 at 01:19:58AM -0500, Peter wrote: raid0: Device already configured! ioctl (RAIDFRAME_CONFIGURE) failed Can anyone lend a hand in this important matter? Let me guess (since you didn't post any configuration): you enabled RAID-autoconfiguration by the kernel *and* you configure the same RAID-device during the boot sequence using raidctl? /etc/rc includes commands to configure the raid devices, and if they've been setup to use autoconfiguration then this is indeed what happens. Expected and nothing to worry about, although noisy. What he said. For my raidframe devices, I just removed the autoconfigure flag. Please use the autoconfigure flag. It is *far* better at gluing together a RAID set than the regular configuration bits, especially in the face of drives that move about or drives that fail to spin up... (the old config code needs to find its way into a bit-bucket..) You really want to use the autoconfigure bits.. :) Really. :) Later... Greg Oster
Re: RAIDframe question
Peter writes: I tried unsuccessfully using the same procedure to set up two disks (sd0 and sd1) attached to a QLogic FibreChannel controller (isp driver). I probably don't have the correct terminology but upon startup the boot code could not be found (would not get beyond the point where the kernel usually kicks in). I'm wondering whether RAIDframe has limitations with this hardware. RAIDframe doesn't care about underlying hardware. It's run on top of a) probably every flavour of SCSI, b) various levels of IDE/pciide, c) FibreChannel, d) ancient things like HP-IB, and e) other RAIDframe devices. If the underlying device can provide something that looks/ smells like a disk partition, that's good enough for RAIDframe. Later... Greg Oster
Re: RAIDframe question
Peter Fraser writes: I had a disk drive fail while running RAIDframe. The system did not survive the failure. Even worse there was data loss. Ow. The system was to be my new web server. The system had 1 Gig of memory. I was working, slowly, on configuring apache and web pages. Moving to a chroot'ed environment was none trivial. The disk drive died, the system crashed, Oh so it *wasn't* just a simple case of a drive dying, but the system crashed too... Well, RAIDframe can't make any guarantees when there's a system crash -- if buffers havn't been flushed or there's still pending meta-data to be written, there's not much RAIDframe can do about that... those are filesystem issues. and the system rebooted and came up. Remove the dead disk and replacing it with a new disk and reestablishing the raid was no problem. But why was there a crash, I would of thought that the system should run after a disk failure. You havn't said what types of disks. I've had IDE disks fail that take down the entire system. I've had IDE disks fail but the system remains up and happy. I've had SCSI disks fail that have made the SCSI cards *very* unhappy (and had the system die shortly after). None of these things can be solved by RAIDframe -- if the underlying device drivers can't deal in the face of lossage, RAIDframe can't do anything about that... You also havn't given any indication as to the nature of the crash, or what the panic message was (if any). (e.g. was it a null-pointer dereference, or a corrupted filesystem or something that went wrong in the network stack?) And even more to my surprise, about two days of my work disappeared. Of course, you just went to your backups to get that back, right? :) I believe, the disk drive died about 2 days before the crash. I also believe that RAIDframe did not handle the disk drive's failure correctly Do you have a dmesg related to the drive failure? e.g. something that shows RAIDframe complaining that something was wrong, and marking the drive as failed? and as a result all file writes to the failed drive queued up in memory, I've never seen that behaviour... I find it hard to believe that you'd be able to queue up 2 days worth of writes without a) any reads being done or b) not noticing that the filesystem was completely unresponsive when a write of associated meta-data never returned... (on the first write of meta-data that didn't return, pretty much all IO to that filesystem should grind to a halt. Sorry.. I'm not buying the it queued up things for two days... ) when memory ran out the system crashed. I don't know enough about OpenBSD internals to know if my guess as to what happened is correct, but it did worry me about the reliability of RAIDframe. I've been running RAIDframe (albeit not w/ OpenBSD) in both production and non-production environments now for 7+ years... RAIDframe reliability is the least of my worries :) (RAIDframe has also saved mine and others' data on various occasions over the years...) I am now trying ccd for my web pages and ALTROOT in daily for root, I have not had a disk fail with ccd yet, so I have not determined whether ccd works better. Good luck. (see a different thread for my thoughts on using ccd :) Neither RAIDframe or ccd seems to be up the quality of nearly all the other software in OpenBSD. This statement is also true of the documentation. My only comment on that is that the version of RAIDframe in OpenBSD is somewhat dated. You are also encouraged to find and read the latest versions of the documentation, and to provide feedback to the author on what you feel is lacking. Later... Greg Oster
Re: RAIDframe question
Andy Hayward writes: On 2/1/06, Greg Oster [EMAIL PROTECTED] wrote: Peter Fraser writes: and as a result all file writes to the failed drive queued up in memory, I've never seen that behaviour... I find it hard to believe that you'd be able to queue up 2 days worth of writes without a) any reads being done or b) not noticing that the filesystem was completely unresponsive when a write of associated meta-data never returned... (on the first write of meta-data that didn't return, pretty much all IO to that filesystem should grind to a halt. Sorry.. I'm not buying the it queued up things for two days... ) I've seem similar on a machine with a filesystem on a raid-1 partition and mounted with softdeps enabled. From what I remember the scenario was something like: * copied 10Gb or so of data to new raid-1 filesystem * system then left idle for 30mins or so * being an idiot, pulled the wrong plug out of the wall * upon reboot, and after raid resync and fsck, most of the copied data was no longer there RAIDframe can only write what it's given. If, after 30 minutes, the filesystem layers havn't synced all the data, RAIDframe can't do anything about that... if left idle for 30 minutes, that filesystem should have synced itself many times over, to the point that fsck shouldn't have found anything to complain about... (I strongly suspect you'd see exactly the same behaviour without RAIDframe involved here... I also suspect you wouldn't see the same behavior without softdeps, RAIDframe or not.) Later... Greg Oster
Re: RAIDframe issues on 3.8
Dave Diller writes: Here's what's been changed in the kernel not a lot. I don't understand why it would panic on a simple reconstruct command. I might be able to understand, but my crystal ball is at the cleaners, and I can't guess what your panic message looked like, nor what the traceback was. Here's the current status of the RAID: [snip] I can't fix the parity either now, it fails on both a -i and a -P attempt. Right. The RAID set only has one good component. There's nothing to rebuild parity onto with just one good component. wd2a's component label looks fine: [snip] wd1a's does not... pretty much every bit of data that could be changed, has been somehow. I suspect this is part of the root reason that raid0 can't dea l with it, but I can't seem to get it to reinitialize correctly either. bash-3.00# raidctl -g /dev/wd1a raid0 Component label for /dev/wd1a: Row: 16, Column: 24, Num Rows: 1312, Num Columns: 16 Version: 0, Serial Number: 0, Mod Counter: 8 Clean: Yes, Status: 1133920558 sectPerSU: 9775536, SUsPerPU: 9620351, SUsPerRU: 119 Queue size: 2048, blocksize: 8, numBlocks: 5 RAID Level: Autoconfig: Yes Root partition: Yes Last configured as: raid256 Hmmm... I don't understand this... the label should be the same as the other, sans the Column and Mod Counter fields. (and possibly Clean and Status). [snip] RAIDFRAME: protectedSectors is 64. Hosed component: /dev/wd1a. Hosed component: /dev/wd1a. raid0: Component /dev/wd2a being configured at row: 0 col: 0 Row: 0 Column: 0 Num Rows: 1 Num Columns: 2 Version: 2 Serial Number: 100 Mod Counter: 183 Clean: No Status: 0 /dev/wd2a is not clean ! raid0: Ignoring /dev/wd1a. RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep limit. raid0 (root)raid0: Error re-writing parity! I agree it's 'hosed', just looking at the component label! Nice message. :) If I umount the partition and try to -u(nconfigure) raid0, it kernel panics. That shouldn't happen either. If I -R(econstruct), it kernel panics.I've messed it up but good. How do I reinitialize wd1a and/or raid0 and/or start over completely? You'll have to boot without /etc/raid0.conf. You can then re-do the config with -C and then -I again, but that won't help you when a disk fails and you get the panic messages on trying to reconstruct. (raidctl -R should *not* panic... (in the version of RAIDframe you're running, it will it encounters a read error or a write error while doing the reconstruct, but that's probably a different problem)). Oh... and from the obivous bugs department: In rf_openbsdkintf.c case RAIDFRAME_GET_COMPONENT_LABEL: there is a: RF_Free( clabel, sizeof(RF_ComponentLabel_t)); missing before the: return(EINVAL); But that won't help with the problem your describing... (just noticed the above as I was perusing the code..) Later... Greg Oster
Re: RAIDframe issues on 3.8
Dave Diller writes: Oh... and from the obivous bugs department: In rf_openbsdkintf.c case RAIDFRAME_GET_COMPONENT_LABEL: there is a: RF_Free( clabel, sizeof(RF_ComponentLabel_t)); missing before the: return(EINVAL); But that won't help with the problem your describing... (just noticed the above as I was perusing the code..) Ha! Guess it helps when you wrote the original version, eh? Nice. Well... 7 years ago now for some of these bits :) (and yes, this bug is entirely mine :) ) It definitely seems to be related to issues with rf_openbsdkintf.c though - I was just pointed to this bug by the gentleman who opened it a couple of months ago: http://cvs.openbsd.org/cgi-bin/query-pr-wrapper?full=yesnumbers=4508 Ahh.. that one. which has the same panic that I'm seeing. Sorry for not including it initially, BTW. Didn't have an easy way to do that since I'm remote with no console. Resolution was State-Changed-Why: Fixed in revision 1.28 of rf_openbsdkintf.c, thanks for the report and I'm running /* $OpenBSD: rf_openbsdkintf.c,v 1.27 2004/11/28 02:47:14 pedro Exp $ */ So, time to resolve that via the latest -stable and try again. Yup. Do you have the cycles to get a bug in queue for the one you spotted on a quick once-over, before someone gets nailed by THAT one? I could open it, but it would merely say didn't run into the problem, but Greg Oster says its an obvious bug... ;-) I mentioned it here since it's an easy one for someone to fix... You can file a problem report if you'd like, but I don't want to get started filing PR's for RAIDframe stuff in OpenBSD -- there have been a lot of changes/fixes to RAIDframe in the last 5 years that aren't reflected in the code in OpenBSD, and I wouldn't know where to begin :) Later... Greg Oster
Re: Updated CCD Mirroring HOWTO
Nick Holland writes: Greg Oster wrote: ... Here's what I'd encourage you (or anyone else) to do: actually, I'd encourage you do try your own test. Results were interesting. Well... as we see, you did *your* version of the test, not mine ;) 1) Create a ccd as you describe in the HOWTO and mount the filesystem. used my own instructions, if you don't mind. :) Softdeps on. That may matter. Or it may not. Not sure. Shouldn't be a big deal either way.. 2) Start extracting 5 copies of src.tar.gz onto the filesystem ( simultanously is preferred, but basically anything that will generate a lot of IO here is what is needed). I wussed out here. Did one unpacking of a Maildir in a .tgz file. But lots of IO, lots of thrashing, disks were basically saturated with work, processor was waiting for disk. Lots of tiny files. On the other hand, that's a lot more activity than this machine will ever see in production. Um... that's just one thread of IO... 64K (or whatever MAXPHYS is) presented, in sequence, to the underlying driver. A rather boring sequence of IO, with not much chance for one disk to get ahead or behind the other in terms of servicing requests. The 5 was there for a reason :) So, actually, was src.tar.gz. To make things more interesting, do a whole mess of reads from the ccd while you're doing the 5 extractions (preferably for something that isn't cached). (If I were testing this on my machine, I'd likely start with 10 different copies of src.tar.gz on the ccd, and then extract all 10 simultanous (to different destinationson the ccd). Once that was going, I'd then start about 50 dd's of the src.tar.gz files, each dd starting about 10 seconds after the previous. When all IO had begun, I'd wait a few minutes and *then* pull the rug out from the system. But I didn't expect anyone to push their system that hard for this test, and so went with 5, and just one copy of src.tar.gz in an unspecified location :) ) My first (and second) test was copying the 86M .tgz file, but that was horribly uninteresting. Resetting the machine well into the copy resulted in a zero-byte file after fsck. Truncated. Not a big surprise, really. 3) After that's been going for a while, and while still in progress, pull the power from the machine. Drop power mid write, you are risking your disk. Yes, I have spiked disks with a nail gun to test RAID in the past, but didn't feel like possibly toasting two disks by powering down the machine mid-write at this time. This system has purpose for me. :) Heh.. my RAID test box has a disk in external case.. disk 'failure' is simulated by powering off that case... I don't know how many power outages that poor little disk has seen :) So, I hit the reset button on the machine. That should give something similar to (though admittedly, not identical to) a crash. Yes, should suffice for this test ... No, hitting the reset is NOT the same as a power outage. It isn't the same as a crash either -- in the later case, I'm going to say that it is just different, not easier or harder...so my test is only one kind of failure (and I REALLY didn't feel like pulling a memory module out to simulate a HW failure... :) 4) Fire the machine back up, configure the ccd again, and run fsck a few times to make sure the ccd filesystem is clean. once did the job. Second fsck came up clean. Don't expect different results on the third or fourth... 5) Now unconfigure the ccd. mounted each separately as a non-mirrored ccd file system. 6) Do an md5 checksum of each of the parts of the mirror, and see if they differ. (they shouldn't, but I bet the do!!) I think the md5 test of the mirror elements is bogus here. I don't care if an unallocated block is different. I care if the files are different. I might not even care about that much. See below... Umm There is still a non-zero chance that metadata on one disk will be different than metadata on the other, or that data on one disk will be different than the other... If they differ, tell me how ccd detected that difference, and how it warned you that if the primary drive died that you'd have incorrect data. If they don't differ, go buy a lottery ticket, cause it's your lucky day! ;) I used diff(1) to compare the two trees created by splitting the mirror. No difference found. i.e., ccd(4) mirroring passed a somewhat simplified version of your test. I even modified one of the files to make sure I didn't blow the diff command usage... 188M of files in the tree, no differences. I will admit I was pleasantly surprised, though not totally shocked that it did. With only one IO thread, I'm not overly surprised with these results... My first clue was what happened when I tried to interrupt the copy of a single very large file to the ccd(4) file system. Even though many megabytes had been transfered, by the time fsck
Re: Updated CCD Mirroring HOWTO
right is sometimes a bit more work... :) CCD is easy to set up (once you figure out the steps) and I think it provides some protection against harddisk failures. There is *some* protection, provided one can guarantee the mirrors are in-sync at ccd configuration time. Here's what I'd encourage you (or anyone else) to do: 1) Create a ccd as you describe in the HOWTO and mount the filesystem. 2) Start extracting 5 copies of src.tar.gz onto the filesystem ( simultanously is preferred, but basically anything that will generate a lot of IO here is what is needed). 3) After that's been going for a while, and while still in progress, pull the power from the machine. 4) Fire the machine back up, configure the ccd again, and run fsck a few times to make sure the ccd filesystem is clean. 5) Now unconfigure the ccd. 6) Do an md5 checksum of each of the parts of the mirror, and see if they differ. (they shouldn't, but I bet the do!!) If they differ, tell me how ccd detected that difference, and how it warned you that if the primary drive died that you'd have incorrect data. If they don't differ, go buy a lottery ticket, cause it's your lucky day! ;) Later... Greg Oster