Re: /bsd raid0 Error re-writing parity!

2009-07-15 Thread Greg Oster
[please CC me on any replies, as I'm not on m...@openbsd.org]

Siju George writes:
 Hi,
 
 I am not able to re write the parity for my Raid set.
 
 # raidctl -Sv raid0
 raid0 Status:
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.
 raidctl: ioctl () failed
 # raidctl -sv raid0
 raid0 Components:
/dev/wd0d: failed
/dev/wd1d: optimal

You cannot rebuild parity in this case because one of your disks has 
failed (parity re-writing can only happen if all the disks are 'good').  
If you wish to attempt to rebuild that disk, you can do so 
with: raidctl -vR /dev/wd0d raid0
to do a reconstruct-in-place.

Of course, finding out the real reason for why that disk has failed 
should also be on the top of your TODO list :)  (e.g. if the disk is 
having physical read/write errors, you proably want to replace it 
before doing the rebuild. )

Later...

Greg Oster



Re: RaidFrame woes on 4.2 (RAIDFRAME: failed rf_ConfigureDisks with 2)

2007-10-14 Thread Greg Oster
knitti writes:
 Hi,
 
 I tried to set up a RAID 1 softraid with raidframe, but no matter what
 I try, the RAID refuses to configure. So please, if anyone has an idea
 what I may have missed...
 
 # raidctl -C raid0.conf raid0
 raidctl: ioctl (RAIDFRAME_CONFIGURE) failed
 
 this adds the following lines to the dmesg buffer:
 
 raidlookup on device: /dev/wd3d  failed !
 ^
I suspect you have an extra space after wd3d in the config file... 
And, unfortunately, that annoying little non-feature is enough to 
stop RAIDframe in its tracks... :(

(A fix for the issue is here: 
http://cvsweb.netbsd.org/bsdweb.cgi/src/sbin/raidctl/rf_configure.c.diff?r1=1.19r2=1.20
)

Otherwise what you have is just fine..

Later...

Greg Oster



Re: RAID1 powerloss - can parity rewrite be safely backgrounded?

2007-09-28 Thread Greg Oster
Matt writes:

 As for the suggestion of hardware raid - unfortunately this is a live 
 server. If I migrate it to another machine I will definitely try 
 hardware raid 
 I know it is a lot faster

Really? :)  There is no guarantee that a hardware RAID is faster than 
a software RAID, or vice-versa.  There is also no guarantee that a 
commercial software RAID solution is faster than RAIDframe... ;)

Hardware RAID is just software RAID on a card.  And so whether a
hardware implmentation of software RAID is faster or slower than
a traditional software RAID just depends on where the bottlenecks 
have been moved to :)  Filesystems, data mixes, and underlying 
hardware will still all be important parts... 

 but would that solve the parity problem on 
 boot completely? 'man bio' doesn't seem to answer that.

It depends on how the hardware RAID card keeps track of what parity 
bits are up-to-date :)  If you don't have a good battery in the thing, 
then you might just be in the same boat as you are with RAIDframe
(but because it's all hidden, you might not know it!).

Don't let the idea that because it's hardware RAID it's automatically 
better lull you into a false sense of security understand the 
features and benefits of both, do the analysis, and pick the one that 
will work best for you.

Later...

Greg Oster



Re: RAID1 powerloss - can parity rewrite be safely backgrounded?

2007-09-28 Thread Greg Oster
Brian A. Seklecki writes:
 raid(4) hasn't been touched in a while (years), so short answer: No.
 
 NetBSD is still actively committing to it, though, and has functional 
 background parity recalculation.

Just to be clear here: the background parity checking in NetBSD as of 
today is functionally the same as what OpenBSD has right now.

The implications here are as follows: if the parity is checked in the 
background, and a non-parity component should fail, there is a very 
low, but non-zero probability of data loss.  The longer it takes to 
check (and correct, if necessary) the parity, the greater the chance 
of loss.  The value of your data should dictate whether you can live 
with that increase in risk.

For the record, I do the parity checking in the background on all the 
machines I look after.  Since most of them can complete the check in 
under an hour, there is that one hour window where some fragments of 
corruption *may* have occurred (and that didn't get caught with a 
filesystem check). 

 I understand there is interest in replacing RAIDFrame instead of 
 resynchronizing the subtree.
 
 In the mean time, find a hardware RAID Controller that can be managed by 
 OpenBSD via bio(4) and grab a UPS that works with upsd(8).

I worry more about a hardware RAID card forgetting its configuration 
after a power outage than I do about parity checking in the 
background :) (What do you mean these 14 disks in this 2TB hardware 
RAID array are now all 'unassigned'!?!?!?!.  That wasn't a fun day.)

Later...

Greg Oster

 On Thu, 27 Sep 2007, Rob wrote:
 
  On 9/25/07, Matt [EMAIL PROTECTED] wrote:
  I'm running a RAID1 mirror on OpenBSD 4.1 (webserver)
  On a power failure the parity becomes dirty and needs rewriting, which
  results in  1.5 hours 'downtime'.
  Is it safe to background this in /etc/rc or is that a no-no?
 
  I found a reference this was possible/safe on-list but it was a) 2003
  and b) dealt with RAID5.
  I'd like to make sure I am not doing something dangerous.
 
  I frankly don't know enough to guarantee that this is safe, or not,
  but I had a RAID1 with big disks on an ancient machine that took about
  26 hours to check parity (! -- this wasn't my idea), and I modified
  its rc to boot up, and then begin performing the parity check in the
  background.
 
  The only caveat I would give is that the operating system was
  installed and running on a 3rd, separate disk, and that network access
  to the mirrored drives was disabled until the parity rewrite was
  complete.
 
  - R.



Re: Seeking info for RAID 1 on OpenBSD

2007-08-04 Thread Greg Oster
L. V. Lammert writes:
 On Fri, 3 Aug 2007, Joel Knight wrote:
 
  --- Quoting HDC on 2007/08/02 at 20:26 -0300:
 
   Read this...
   http://www.packetmischief.ca/openbsd/doc/raidadmin/http://www.packetmisc
 hief
   .ca/openbsd/
  
 
  I used to use raidframe and followed the procedures in that doc for
  doing so, but now there's no point. If the system requires any type of
  raid, go hardware. Long live bio(4).
 
 IF you choose to NOT use a h/w controller, use rsync instead. Permits
 quick recovery in the case of a drive failure (swap drive cables 
 reboot), does not require lengthy parity rebuild.

And you only lose the data written since the last rsync... 
and your system probably goes down instead of staying up until you 
can fix it.. 

RAIDframe, like hardware RAID and rsync, is just another tool.  
Understand the pros and cons of each, but be willing to accept the 
risks associated with whatever you choose... (if you think hardware 
RAID is riskless, then you've never had a 2TB RAID set suddenly 
decide that all components were offline and mark them as such!)

For the folks who dislike the long parity checks... If you're 
willing to accept a window during which some of your data *might* be 
at risk, change: 
 raidctl -P all
to something like
 sleep 3600 ; raidctl -P all 
in /etc/rc .  This will, of course, delay the start of the parity 
computation for an hour or so, giving your system a chance to do the 
fscks and get back to multi-user as quickly as possible.

The risk here is as follows (this is for RAID 1.. risks for RAID 5 
are slightly higher): 
  1) even though parity is marked 'dirty', it might actually be in 
sync.  In this case if you have a component failure, your data is 
fine.
  2) until the parity check is done, only the 'master' component is 
used for reading.  But any writes will be done are mirrored to both 
components.  That means that when the fsck is being done, any 
problems found will be fixed on *both* components, and writes will 
keep the two in sync even before parity is checked.
  3) Where the risk of data loss comes in is if the master dies 
before the parity check gets done.  In this case, data on the master 
that was not re-written or that was out-of-sync with the slave will 
be lost.  This could result in the loss of pretty much anything.

The important thing here is for you to evaluate your situation and 
decide whether this level of risk is acceptable... For me, I use the 
equivalent to 'sleep 3600' on my home desktop.. and slightly modified 
versions of it on other home servers and other boxen I look after.. 
But don't blindly listen to me or anyone else -- learn what the risks 
are for your situation, determine what level of risk you can accept, 
and go from there...

Later...

Greg Oster



Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64

2007-03-08 Thread Greg Oster
Siju George writes:
 On 3/8/07, Greg Oster [EMAIL PROTECTED] wrote:
  Siju George writes:
   In my dmesg at one point it says
  
   ==
   Kernelized RAIDframe activated
   dkcsum: wd0 matches BIOS drive 0x80
   dkcsum: wd1 matches BIOS drive 0x81
   root on wd0a
   
 
  So this gets printed from autoconf.c   but it *shouldn't* since
 
  boothowto |= RB_DFLTROOT;
 
  in rf_openbsdkintf.c should cause the setroot() function to bail
  before printing the above  So for some reason it's not calling
  the appropriate bits in rf_buildroothack() in rf_openbsdkintf.c
  But exactly why, I have no idea...
 
  [snip]
   Could you please shed any light on why my root device is not raid0
   but wda0 still?
 
  No idea right now.. if you build a kernel with RAIDDEBUG defined and
  send the dmesg from that, I might be able to provide additional
  info...
 
 
 alright thankyou :-)
 
 here is it. hope it will help you see more into the issue :-)
[snip]
 Kernelized RAIDframe activated
 Searching for raid components...
 dkcsum: wd0 matches BIOS drive 0x80
 dkcsum: wd1 matches BIOS drive 0x81
 root on wd0a
 rootdev=0x0 rrootdev=0x300 rawdev=0x302
 RAIDFRAME: protectedSectors is 64.
 raid0: Component /dev/wd0d being configured at row: 0 col: 0
  Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
  Version: 2 Serial Number: 200612010 Mod Counter: 844
  Clean: Yes Status: 0
 raid0: Component /dev/wd1d being configured at row: 0 col: 1
  Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
  Version: 2 Serial Number: 200612010 Mod Counter: 844
  Clean: Yes Status: 0
 RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep limit.
 raid0 (root)
 #

So this is still not the output I'd expect what does 'disklabel wd0' 
and 'disklabel wd1' say?  Are wd0d and wd1d of type FS_RAID ??  You should 
be seeing a Component on wd0d and then the full component label, and that 
should be printed before the dkcsum bits... It's still almost as 
though RAID_AUTOCONFIG isn't defined... (but it is, since the 
Searching... line above is printed...)

Later...

Greg Oster



Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64

2007-03-08 Thread Greg Oster
Siju George writes:
 On 3/8/07, Greg Oster [EMAIL PROTECTED] wrote:
  [snip]
   Kernelized RAIDframe activated
   Searching for raid components...
   dkcsum: wd0 matches BIOS drive 0x80
   dkcsum: wd1 matches BIOS drive 0x81
   root on wd0a
   rootdev=0x0 rrootdev=0x300 rawdev=0x302
   RAIDFRAME: protectedSectors is 64.
   raid0: Component /dev/wd0d being configured at row: 0 col: 0
Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
Version: 2 Serial Number: 200612010 Mod Counter: 844
Clean: Yes Status: 0
   raid0: Component /dev/wd1d being configured at row: 0 col: 1
Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
Version: 2 Serial Number: 200612010 Mod Counter: 844
Clean: Yes Status: 0
   RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep lim
 it.
   raid0 (root)
   #
 
  So this is still not the output I'd expect what does 'disklabel wd0'
  and 'disklabel wd1' say?  Are wd0d and wd1d of type FS_RAID ??
 
 
 nope :-(
 So that is the reason right?

Yes.

 is there any hope of fixing it now?

It should just work to change 4.2BSD to RAID...  as long as you're 
never actually mounting /dev/wd0d or /dev/wd1d anywhere it'll be 
fine... 

 Will the raid be functioning right actually?
 Do you want me to recreate it with FS_RAID?

You should only need to tweak the disklabel.  If you boot single-user 
you should see root on /dev/raid0a .. at that point you can mount / 
read-write and fix /etc/fstab if necessary.  You shouldn't need to 
rebuild the RAID set... 
 
 ==
[snip]
 =
 
 
 You should
  be seeing a Component on wd0d and then the full component label, and that
  should be printed before the dkcsum bits... It's still almost as
  though RAID_AUTOCONFIG isn't defined... (but it is, since the
  Searching... line above is printed...)
 
 
 RAID_AUTOCONFIG is defined but for that to work the FS type shoud be
 FS_RAID right?

Yes...  if it's not FS_RAID, then for i386/amd64/(and others) it 
won't even consider the partition for autoconfig... 

 Do you think this setup is bad actually?

Nope... just needs a disklabel change and it should work...

Later...

Greg Oster



Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64

2007-03-07 Thread Greg Oster
Siju George writes:
 On 3/6/07, Greg Oster [EMAIL PROTECTED] wrote:
  Siju George writes:
 
  It's working just fine... just probably telling you a bit more than
  you really wanted to know :)
 
  Later...
 
 
 Greg,
 
 Seeing that you work on RAIDFRAME let me dare to ask you one more thing :-)
 
 In my dmesg at one point it says
 
 ==
 Kernelized RAIDframe activated
 dkcsum: wd0 matches BIOS drive 0x80
 dkcsum: wd1 matches BIOS drive 0x81
 root on wd0a
 
 
 Shouldn't the root be on raid0a ?

I don't know what OpenBSD is in 
with respect to root-on-RAID.  

 Since the dmesg again shows
 
 ===
 raid0 (root)
 ===
 
 and raidctl shows
 
 =
  Autoconfig: Yes
Root partition: Yes
 
 
 for both drives?

what does 'mount' say for '/'?  RAIDframe used to do a bit of 
'hijacking' of the boot disk in order to get itself in as /, 
but I don't know the details in OpenBSD... 

[snip]
 
 Could you please shed any light on why my root device is not raid0
 but wda0 still?

That part is specific to OpenBSD... 

 Thankyou so much ( especially for the simple make file introduction on
 your website )
 
 kind Regards
 
 Siju
 
 For my Full dmesg
 
 
 OpenBSD 4.0 (GENERIC.RAID2) #0: Fri Nov 24 20:28:14 IST 2006
 [EMAIL PROTECTED]:/usr/src/sys/arch/amd64/compile/GENERIC.
 RAID2
 real mem = 1039593472 (1015228K)
 avail mem = 878211072 (857628K)
 using 22937 buffers containing 104165376 bytes (101724K) of memory
 mainbus0 (root)
 bios0 at mainbus0: SMBIOS rev. 2.3 @ 0xfc650 (54 entries)
 bios0: Acer Aspire Series
 cpu0 at mainbus0: (uniprocessor)
 cpu0: AMD Athlon(tm) 64 Processor 3400+, 2193.94 MHz
 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
 ,CFLUSH,MMX,FXSR,SSE,SSE2,SSE3,NXE,MMXX,FFXSR,LONG,3DNOW2,3DNOW
 cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
 64b/line 16-way L2 cache
 cpu0: ITLB 32 4KB entries fully associative, 8 4MB entries fully associative
 cpu0: DTLB 32 4KB entries fully associative, 8 4MB entries fully associative
 pci0 at mainbus0 bus 0: configuration mode 1
 pchb0 at pci0 dev 0 function 0 ATI RS480 Host rev 0x10
 ppb0 at pci0 dev 1 function 0 ATI RS480 PCIE rev 0x00
 pci1 at ppb0 bus 1
 vga1 at pci1 dev 5 function 0 ATI Radeon XPRESS 200 rev 0x00
 wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
 wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
 pciide0 at pci0 dev 17 function 0 ATI IXP400 SATA rev 0x80: DMA
 pciide0: using irq 11 for native-PCI interrupt
 pciide0: port 0: device present, speed: 1.5Gb/s
 wd0 at pciide0 channel 0 drive 0: ST3120827AS
 wd0: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
 wd0(pciide0:0:0): using BIOS timings, Ultra-DMA mode 6
 pciide0: port 1: device present, speed: 1.5Gb/s
 wd1 at pciide0 channel 1 drive 0: ST3120827AS
 wd1: 16-sector PIO, LBA48, 114473MB, 234441648 sectors
 wd1(pciide0:1:0): using BIOS timings, Ultra-DMA mode 6
 pciide1 at pci0 dev 18 function 0 ATI IXP400 SATA rev 0x80: DMA
 pciide1: using irq 5 for native-PCI interrupt
 ohci0 at pci0 dev 19 function 0 ATI IXP400 USB rev 0x80: irq 4,
 version 1.0, legacy support
 usb0 at ohci0: USB revision 1.0
 uhub0 at usb0
 uhub0: ATI OHCI root hub, rev 1.00/1.00, addr 1
 uhub0: 4 ports with 4 removable, self powered
 ohci1 at pci0 dev 19 function 1 ATI IXP400 USB rev 0x80: irq 4,
 version 1.0, legacy support
 usb1 at ohci1: USB revision 1.0
 uhub1 at usb1
 uhub1: ATI OHCI root hub, rev 1.00/1.00, addr 1
 uhub1: 4 ports with 4 removable, self powered
 ehci0 at pci0 dev 19 function 2 ATI IXP400 USB2 rev 0x80: irq 4
 usb2 at ehci0: USB revision 2.0
 uhub2 at usb2
 uhub2: ATI EHCI root hub, rev 2.00/1.00, addr 1
 uhub2: 8 ports with 8 removable, self powered
 piixpm0 at pci0 dev 20 function 0 ATI IXP400 SMBus rev 0x81: SMI
 iic0 at piixpm0
 unknown at iic0 addr 0x2f not configured
 pciide2 at pci0 dev 20 function 1 ATI IXP400 IDE rev 0x80: DMA,
 channel 0 configured to compatibility, channel 1 configured to
 compatibility
 azalia0 at pci0 dev 20 function 2 ATI IXP450 HD Audio rev 0x01: irq 5
 azalia0: host: High Definition Audio rev. 1.0
 azalia0: codec: Realtek ALC880 (rev. 8.0), HDA version 1.0
 audio0 at azalia0
 pcib0 at pci0 dev 20 function 3 ATI IXP400 ISA rev 0x80
 ppb1 at pci0 dev 20 function 4 ATI IXP400 PCI rev 0x80
 pci2 at ppb1 bus 2
 re0 at pci2 dev 3 function 0 Realtek 8169 rev 0x10: irq 5, address
 00:16:17:20:2a:a6
 rgephy0 at re0 phy 7: RTL8169S/8110S PHY, rev. 2
 pchb1 at pci0 dev 24 function 0 AMD AMD64 HyperTransport rev 0x00
 pchb2 at pci0 dev 24 function 1 AMD AMD64 Address Map rev 0x00
 pchb3 at pci0 dev 24 function 2 AMD AMD64 DRAM Cfg rev 0x00
 pchb4 at pci0 dev 24 function 3 AMD AMD64 Misc

Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64

2007-03-07 Thread Greg Oster
Siju George writes:
 On 3/6/07, Greg Oster [EMAIL PROTECTED] wrote:
  Siju George writes:
 
  It's working just fine... just probably telling you a bit more than
  you really wanted to know :)
 
  Later...
 
 
 Greg,
 
 Seeing that you work on RAIDFRAME let me dare to ask you one more thing :-)

Bah... I hit the Send button on that last email sooner than I 
wanted to :( 

 In my dmesg at one point it says
 
 ==
 Kernelized RAIDframe activated
 dkcsum: wd0 matches BIOS drive 0x80
 dkcsum: wd1 matches BIOS drive 0x81
 root on wd0a
 

So this gets printed from autoconf.c   but it *shouldn't* since 

boothowto |= RB_DFLTROOT;

in rf_openbsdkintf.c should cause the setroot() function to bail 
before printing the above  So for some reason it's not calling 
the appropriate bits in rf_buildroothack() in rf_openbsdkintf.c
But exactly why, I have no idea...

[snip]
 Could you please shed any light on why my root device is not raid0
 but wda0 still?

No idea right now.. if you build a kernel with RAIDDEBUG defined and 
send the dmesg from that, I might be able to provide additional 
info... 

 Thankyou so much ( especially for the simple make file introduction on
 your website )

:) 

Later...

Greg Oster



Re: raid dmesg output and raidctl -sv output shows differrent status for raidframe mirror on OpenBSD 4.0 amd64

2007-03-05 Thread Greg Oster
Siju George writes:
 Hi,
 
 The dmesg Output Shows
 
 Clean: Yes
 
 for both Raid Components as shown below
 
 
 
 raid0: Component /dev/wd0d being configured at row: 0 col: 0
  Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
  Version: 2 Serial Number: 200612010 Mod Counter: 820
  Clean: Yes Status: 0
 raid0: Component /dev/wd1d being configured at row: 0 col: 1
  Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
  Version: 2 Serial Number: 200612010 Mod Counter: 820
  Clean: Yes Status: 0
 raid0 (root)
 
 =
 
 but raidctl shows
 
 Clean: No
 
 as shown below
 
 Could Some one tell me why this is so?
 it is the same state even after reboots.

The value Yes or No comes directly from the component labels on the 
disks.

If the parity is known good (i.e. the set is clean) when the RAID
sets are unconfigured (actually, when the last open partition is 
unmounted), then the value in the component labels will be set to Yes. 
When a RAID set is configured and a partition is opened/mounted, the 
value is the component labels will be set to no.  And so unless 
things get unmounted/unconfigured correctly, the value will remain at 
no until the parity gets checked.  

What you are seeing here is: 
 a) the values reported by dmesg are from *before* any partitions on 
raid0 get opened.  So if the RAID set was known clean, you'll see a 
value of Yes printed for each component, because that's what they 
got set to at the last shutdown/unmount/unconfigure/etc.

 b) the values reported by raidctl are from *after* a partition on 
raid0 has been opened (even 'raidctl -vs raid0' ends up opening 
/dev/raid0c or whatever, resulting in that clean flag being changed 
from Yes to No).  So it will always say No here, since that 
will be the current value in the component labels.

 Which one should i beleive?

Both of them :)  They are both correct for the time at which they are 
examining the datapoint in question.  That said, the line to really care 
about is this one:

 Parity status: clean

 Is the Raid not working properly?

It's working just fine... just probably telling you a bit more than 
you really wanted to know :) 

Later...

Greg Oster



Re: Raidframe parity problems

2006-12-04 Thread Greg Oster
Julian Labuschagne writes:
 
 Then I had to test the server before putting it into a production 
 enviroment. So I switched of /dev/wd3a.

So at this point wd3a will get marked as failed...

 The system halted itself when I did that... 

oops.  So it wasn't a clean shutdown, and so the parity bits won't 
have been marked as clean

 I started the system started and gave me the following error:
 raid0: Error re-writing parity.

Right.
 
 When I run the command: raidctl -s raid0
 raid0 Components:
 /dev/wd1a: optimal
 /dev/wd2a: optimal
 /dev/wd3a: failed
 Spares:
 /dev/wd4a: spare
 Parity status: DIRTY
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete
 
 I have tried running the following command:
 raidctl -P raid0
 raid0: Parity status: Dirty
 raid0: Initiating re-write of parity
 raid0: Error re-writing parity!
 
 I'm not sure what is going on here my kernel is standard except for 
 raidframe support compiled in. I just can't seem to rebuild the array.
 
 Anybody run into this problem before? Any help would be appreciated.

Everything is behaving normally.  The system can't make sure the 
parity is up-to-date because it is missing a component.  It is a bit 
of a mis-nomer to call the parity DIRTY at this point because, 
well, that's the only information you have to go on, and it's as good 
as it's going to get.  But it's been left the way it is so that one 
can tell if the parity is known good or known questionable...

In any event, 'raidctl -P' isn't going to do anything useful until 
you get wd3a (or its replacment) added back into the array

Later...

Greg Oster



Re: RAIDFrame parity rebuild: why so slow?

2006-10-03 Thread Greg Oster
Jeff Quast writes:
 On 10/3/06, Joerg Zinke [EMAIL PROTECTED] wrote:
  On Mon, 02 Oct 2006 20:11:36 +0200
  nothingness [EMAIL PROTECTED] wrote:
 
   Hi all,
  
 I've been using RAIDFrame on OpenBSD since 3.1 and in 4 years I've
   never seen any performance improvement in getting the system to work
   any faster at rebuilding parity after a hard shutdown. I've tried
   RAID1, RAID5, SCSI drives, IDE drives, processors from PentiumII 400s
   to Athlon64 3200+ and it has *always* been ridiculously slow at
   rebuilding. Just a 9G RAID5 partition takes over 2 hours. A 60G RAID1
   takes 11 hours. 11!!! Before flaming me to say, just go and edit the
   code, it's never been out of beta or whatever, explain why compared
   to other OSes it's always so slow, even to build the first time
   around. Linux's code in particular comes to mind.
 
  maybe this is one of the reasons why raidframe is not officially
  supported and not enabled in stable kernel. i think another reason is
 
 or that it doubles the size of a kernel for a function 5% of openbsd users 
 use.

RAIDframe on i386 archs used to be about 500K, which is ~10% of the 
current size of /bsd.  In a certain other BSD, RAIDframe now weighs in 
at about 148K for i386.

[snip]
 Raidframe was originaly a simulator. A simulator. It was never meant
 to be a kernel driver. It is not meant to ensure speed. It is not
 meant to actualy be used to store real data.

RAIDframe was developed as a framework.  It wasn't just a simulator. 
It wasn't just a user-land RAID driver.  It wasn't just a kernel driver.
It was built with all three to allow rapid prototyping of new types 
of RAID.  Yes, there is some overhead to this, but it's not as large 
as the code size might suggest... (e.g. compare the performance
difference between CCD vs RAID0..)

Later...

Greg Oster



Re: Replacing a failed HD in a raidframe array

2006-09-07 Thread Greg Oster
Jason Murray writes:
[snip]
 
 So according to the raidctl(8) once I add the new HD to the system I do a 
 raidctl -a /dev/hd1d to add it as a spare, then do a raidctl -F component1 
 raid0 to force a rebuild. Then I would modify my /etc/raid0.conf to 
 reflect my new device (which actually won't need modification in my case).
 
 First question: do I have this correct?

Yes.  (s/hd1d/wd1d/ , of course)

 Second question: if the rebuild fails at 48% with bad disk block errors 
 does this mean that wd0 is bad?

Most likely.  (it could be the new disk too, but that's less likely)

Later...

Greg Oster



Re: RAIDframe Root on RAID -- configuring dump device

2006-08-29 Thread Greg Oster
Josh Grosse writes:
 Has anyone using Root on RAID managed to point their dumpdev at a swap space,
 either within a RAID array or on a standard swap partition?

Dumping to a standard swap partition on a RAID set is not supported.

 I have not, and a search of the archives only came up with one posting, with
 a similar question, but no answer: 
 
 http://marc.theaimsgroup.com/?l=openbsd-miscm=111763609916743w=2
 
 I'm running -current on i386, and have just successfully implemented RAID 
 level 1 mirroring.  I am using two Autoconfig devices: 
 
   raid0 (ffs partitions) which is also set as the Root partition
   raid1 (swap).
 
 My kernel configuration is GENERIC plus RAIDframe, which means that my
 config line reads:
 
 configbsd swap generic
 
 When booting normally, with raid0a as root, I get this kernel message right
 before init starts:
 
 swapmount: no device
 
 and then during rc I get:
 
 savecore: no core dump
 
 I have tried modifying the config line.  If I use:
 
 configbsd root on wd0a swap on wd0b
 
 then I do get an unmirrored partition as my swap_device, and it is also a
 dump device. 

Does the config syntax support dumps on wd0b?  Dunno if you can use 
something like:

 config bsd root on wd0a swap on wd0b dumps on wd0b

but that might be sufficient... 

 But ... adding /dev/raid1b doesn't work -- adding this device 
 to /etc/fstab seems to be ignored, and swapctl -a /dev/raid1b fails with 
 file not found.  raid1b is an unacceptable keyword for kernel config. 

Dunno about any of this  I'd have suggested using 
'swapctl -D /dev/wd0b', but I don't believe that'd work for you... 

 Anyone with a successful swap/dump setup who might be able to point me to
 what I'm missing?

You should be able to do it, but not to swap on a RAID set...

Later...

Greg Oster



Re: RAIDframe Root on RAID -- configuring dump device

2006-08-29 Thread Greg Oster
Josh Grosse writes:
 On Tue, Aug 29, 2006 at 02:28:50PM -0600, Greg Oster wrote:
  Josh Grosse writes:
   Has anyone using Root on RAID managed to point their dumpdev at a swap sp
 ace,
   either within a RAID array or on a standard swap partition?
  
  Dumping to a standard swap partition on a RAID set is not supported.
 
 Could you clarify what you mean? 

As in, the raiddump() function returns ENXIO, and has a comment 
saying Not implemented..  In other words, ENOWORKIE :)

 I have a raid1b partition markes as
 swap, and a wd0b partition marked as swap, and I have not figured out how
 to get a dump device assigned, so far, unless I use swap on wd0b -- which
 is unmirrored.  I have no problem with having an unprotected dump area, but
 I am concerned about using the partition as swap space.

Right... If you're going to all the trouble of having a system on 
RAID, you really want swap on RAID too... but not dump... 

 snip 
 
   I have tried modifying the config line.  If I use:
   
   configbsd root on wd0a swap on wd0b
   
   then I do get an unmirrored partition as my swap_device, and it is also a
   dump device. 
  
  Does the config syntax support dumps on wd0b?  Dunno if you can use 
  something like:
  
   config bsd root on wd0a swap on wd0b dumps on wd0b
  
  but that might be sufficient... 
 
 Since root on wd0a swap on wd0b assigns the dump device, isn't dumps
 on wd0b redundant?  Or have I misunderstood you?  Do you think an explicit
 assignment would change this behavior

I'm not sure if OpenBSD even supports that syntax...  (The OS which 
with I'm most familiar does support that syntax, and it allows you to 
have swap and dump space in different places, if that's what you want/
need...  And yes, the explicit assignment, if you can do it, would 
change it... )

   But ... adding /dev/raid1b doesn't work -- adding this device 
   to /etc/fstab seems to be ignored, and swapctl -a /dev/raid1b fails wit
 h 
   file not found.  raid1b is an unacceptable keyword for kernel config. 
  
  Dunno about any of this  I'd have suggested using 
  'swapctl -D /dev/wd0b', but I don't believe that'd work for you... 
 
 There is no -D in current.  Is there an uncommitted swapctl.c in development?

Dunno... swapctl in NetBSD has it, which is why I was going to 
suggest it, but it seems that swapctl in OpenBSD doesn't have it...
  
   Anyone with a successful swap/dump setup who might be able to point me to
   what I'm missing?
  
  You should be able to do it, but not to swap on a RAID set...
 
 I can swap on a RAID set just fine, but only if I leave the config line in
 GENERIC untouched.  But if I do that, I have no dump device.  I seem to be 
 able to swap and dump to non-raid thru altering the config line, as I 
 described.  But if I do that, I cannot then add a RAID set to the swap list.

Yuck... :-/  I'll have to defer to others more familiar with OpenBSD 
to comment on how to get around that little problem But I do know 
that dumping to a RAID device will not work at all. 

Later...

Greg Oster



Re: raidctl on a live raid array, and the kernel debugger

2006-07-17 Thread Greg Oster
Jason Murray writes:
 I've tried, again, to fix my raid array with raidctl -R. I did it on the 
 console port this time so I could capture the output from ddb
 
 Here is some output:
 
 # raidctl -s raid0
 raid0 Components:
 /dev/wd0d: failed
 /dev/wd1d: optimal
 No spares.
 Parity status: DIRTY
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.
 
 So I attempt an inplace  reconstruction of wd0d.
 
 #
 # raidctl -R /dev/wd0d raid0
 Closing the opened device: /dev/wd0d
 About to (re-)open the device for rebuilding: /dev/wd0d
 RECON: Initiating in-place reconstruction on
 row 0 col 0 - spare at row 0 col 0.
 Quiescence reached...
 
 I then use raidctl -S to monitor the reconstruction. Things go well 
 until the 48% mark. Then I get:
 
 wd1d:  uncorrectable data error reading fsbn 111722176 of 
 11722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retrying
 /wd1: transfer error, downgrading to Ultra-DMA mode 4
 wd1(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 4
 wd1d:  uncorrectable data error reading fsbn 111722176 of 
 111722176-111722303 (wd1 bn 114343984; cn 113436 tn 7 sn 55), retrying
 wd1d:  uncorrectable data error reading fsbn 111722248 of 
 111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1), retrying
 wd1d:  uncorrectable data error reading fsbn 111722248 of 
 111722176-111722303 (wd1 bn 114344056; cn 113436 tn 9 sn 1)
 raid0: IO Error.  Marking /dev/wd1d as failed.
 Recon read failed !
 panic: RAIDframe error at line 1518 file 
 /usr/src/sys/dev/raidframe/rf_reconstruct.c
 Stopped at  Debugger+0x4:   leave
 RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
 
 DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
 
 This concerns me because I need wd1d to rebuild my failed wd0d. Any 
 ideas? Drive cables maybe? Any help is greatly appreciated.

You have recent backups, right?  wd1 is failing/dying.  At this point 
you're probably better off in attempting to use 'dd' to recover as 
many bits as you can... (if you do a 'dump' of the filesystem you can 
probably tell from that whether or not there is any 'live data' in 
the portion that is unreadable  if there isn't any live data, 
then you can use 'dd' to make as much of a copy as possible of wd1, 
and use that as the base for reconstructing the RAID set.)

Later...

Greg Oster



Re: raidctl on a live raid array, and the kernel debugger

2006-07-12 Thread Greg Oster
Jeff Quast writes:
 
 My first few months with raidframe caused many kernel panics. With 30
 minutes of parity checking, this was a difficult learning experience.
 I was initialy led to beleive that raidframe was hardly stable (and
 therfor disabled in GENERIC).
 
 However, as I gained experience with raidctl and raidframe, and traced
 the panics to code level, I almost always found the panics were caused
 by my misuse or misinterpretation of raidctl(8). A small book could
 probobly be written on the many different situations you can find
 yourself in with raidframe.
 
 I havn't had a kernel panic for a long time, and have had 3 disks fail
 since on a level 5 raid without issue reconstructing, changing
 geometry, etc. If memory serves me, I may have reconstructed a mounted
 raidset, though given the choice, I certainly wouldn't.

RAIDframe was built to allow reconstructing a mounted RAID set... in 
fact, it goes to a lot of trouble to allow that to happen properly... 
The only 'problem' you might notice would be a performance 
degredation for both the rebuild and any user IO taking place... 

 All in all, I find kernel panics with raidframe is just its way of
 saying Bad choice of arguments :)

RAIDframe in OpenBSD is somewhat lax about checking the input 
provided by raidctl... It works quite well if you don't tell it 
to do anything it's not expecting :-}  (most (all?) of those problems 
have long since been cleaned up -- unfortunately not in the code base 
that's in OpenBSD though :( )

Later...

Greg Oster



Re: no raid reconstruction with autoconfigured sets

2006-06-29 Thread Greg Oster
Walter Haidinger writes:
 Hi!
 
 Summary: raid set reconstruction fails with error rewriting parity
 for sets with non-root autoconfigure enabled, works when disabled.
 It seems as if there is a bug when reading the the component label.
 
 Details:
 I'm running a OpenBSD 3.9 GENERIC kernel with RAID enabled.
 That is, no other changes but the ones from raid(4):
pseudo-device raid 4
optionRAID_AUTOCONFIG
 
 I'm running a raid1 mirror of both ide channel master devices.
 After a complete disk failure of wd1, I replaced the faulty drive
 and rebootet (came up in degraded mode on wd0 just fine) and did
 fdisk/disklabel to match wd0 layout which looks as in
 Auto-configuration and Root on RAID of raidctl(8):
 wd[01]a: minium openbsd install (/bsd is RAID capable kernel)
 wd[01]e: raid0  (raid0a is /)
 wd[01]f: raid1  (raid1b is swap)
 wd[01]g: raid2  (raid2d is /usr, raid2e is /var, ...)
 All raid sets are set to autoconfigure, raid0 as root autoconfigure.
 
 Then I tried to resync using the method from raidctl(8), bottom of
 Dealing with Component Failures, i.e.:
 # raidctl -a /dev/wd1e raid0
 # raidctl -F component1 raid0
 # raidctl -a /dev/wd1f raid1
 # raidctl -F component1 raid1
 # raidctl -a /dev/wd1g raid2
 # raidctl -F component1 raid2
 
 Only rebuilding root autoconfigued raid0 set succeeded.
 Non-root sets raid1 and raid2 failed with
 raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL).
 
 Adding a spare did work:
 
 # raidctl -a /dev/wd1g raid1

Isn't that the spare you used for raid2 ?

 # raidctl -vs raid1
 raid1 Components:
/dev/wd0f: optimal
   component1: failed
 Spares:
/dev/wd1f: spare

Oh.. but here it's correct..

 Component label for /dev/wd0f:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 298644, Mod Counter: 657
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1024000
RAID Level: 1
Autoconfig: Yes
Root partition: No
Last configured as: raid1
 component1 status is: failed.  Skipping label.
 /dev/wd1f status is: spare.  Skipping label.
 Parity status: DIRTY
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.
 
 However, failure and immediate reconstruction did not work:
 
 # raidctl -F component1 raid1
 # raidctl -vs raid1
 raid1 Components:
/dev/wd0f: optimal
   component1: reconstructing
 Spares:
/dev/wd1f: used_spare
 Component label for /dev/wd0f:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 298644, Mod Counter: 658
Clean: No, Status: 0
sectPerSU: 128, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 1024000
RAID Level: 1
Autoconfig: Yes
Root partition: No
Last configured as: raid1
 component1 status is: reconstructing.  Skipping label.
 raidctl: ioctl (RAIDFRAME_GET_COMPONENT_LABEL) failed

Hmm.. where is the lines saying reconstruction is n% complete? 
(they arn't pretty, but in this case they'd be useful)

 raidctl -F subsequently fails with error rewriting parity.

That error will only come from attempting to check parity on a RAID 
set with a failed component.  It has nothing to do with raidctl -F.

How long did you wait for the reconstruction to finish?  
For the above output, note that it still says reconstructing 
for component1...  When that finishes, it will say spared.  

Later...

Greg Oster



Re: no raid reconstruction with autoconfigured sets

2006-06-29 Thread Greg Oster
Walter Haidinger writes:
 First of all: Thanks for replying to an issue with a
 non-generic kernel! I really appreciate that!

That it was a non-generic kernel didn't even cross my mind... it was 
an issue w/ RAIDframe, and that's why I responded...

 On Thu, 29 Jun 2006, Greg Oster wrote:
 
   Adding a spare did work:
   
   # raidctl -a /dev/wd1g raid1
  
  Isn't that the spare you used for raid2 ?
 
 Sorry, cutpaste error, should have been wd1f.
  
  Hmm.. where is the lines saying reconstruction is n% complete? 
  (they arn't pretty, but in this case they'd be useful)
 
 I'm sorry, I did not record those. Reconstructing did take some time,
 though, I recall checking the progress, nothing suspicous there,

So did the reconstruction actually complete? 

   raidctl -F subsequently fails with error rewriting parity.
  
  That error will only come from attempting to check parity on a RAID 
  set with a failed component.  It has nothing to do with raidctl -F.
 
 Oh yes, of course! Should have mentioned that I've tried raidctl -P 
 after raidctl -F ...

Ok... so the big question is still: how far along was the 
reconstruction?  raidctl -P would fail even if the reconstruct was 
still in progress.
 
  How long did you wait for the reconstruction to finish?  
  For the above output, note that it still says reconstructing 
  for component1...  When that finishes, it will say spared.  
 
 And what about the spare? Shouldn't it replace component1?

It won't replace it in the output of 'raidctl -s', but it will 
replace component1 for all accesses and what-not.. (and will take its 
proper place (with autoconfig turned on) after a reboot (well... 
sans a known bug in rf_reconstruct.c where this line:

  c_label.partitionSize = raidPtr-Disks[srow][scol].partitionSize;

should be added to where it says:

  /*  MORE NEEDED HERE. */
)

 That never happend. Instead, component1 sequence was:
 failed - reconstructing - failed. 

Hmm... I think you should see failed-reconstructing-spared
(that's what you'd see if 'component1' was a normal disk...)

You might want to check /var/log/messages* for some indication as to 
why the reconstruction failed... (as well, there should be something 
in there indicating the reconstruction completed, if it did...)

Later...

Greg Oster



Re: RAIDframe, swapping components in a RAID 1 array

2006-05-22 Thread Greg Oster
Paul Wright writes:
 Hi all,
 
 I've followed a set of instructions[1] describing a method of
 installing OpenBSD onto a RAID 1 array created with raidctl using only
 2 disks (sd0b + sd1b).  The basic premise is to first install normally
 onto one disk (sd0b) and then created a degraded RAID 1 array using
 the second disk (sd1b) and a fake third disk (sd2b).  After booting
 off the array you then add the original first (sd0b) disk to the array
 and rebuild.
 
 This works but the changes don't 'stick' between reboots, the array
 promptly forgets about sd0b:
 
 # raidctl -s raid0
 raid0 Components:
   component0: failed
/dev/sd1b: optimal
 No spares.
 Parity status: clean
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.
 
 # raidctl -a /dev/sd0b raid0
 # raidctl -F component0 raid0
 
 # raidctl -s raid0
 raid0 Components:
   component0: spared
/dev/sd1b: optimal
 Spares:
/dev/sd0b: used_spare
 Parity status: clean
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.

Try doing a:

 raidctl -i 605190 raid0

here before rebooting.  I seem to recall a bug related to component 
labels on used spares not being updated properly after a reconstruct, 
and I think re-running the '-i' option was the workaround... 

Later...

Greg Oster



Re: RAID label problem?

2006-03-22 Thread Greg Oster
Xavier Mertens writes:
 Hi,
 
 I'm busy to set up a box with 2 x 80GB disks in RAID1.
 I'm following the procedures found online but, once the RAID is initialized, 
 I got the follow error while trying to partition it:
 
 Write new label?: [y]
 disklabel: ioctl DIOCWDINFO: No space left on device
 disklabel: unable to write label
 
 The RAID is up, consistant:
 
 # raidctl -s raid0
 raid0 Components:
/dev/wd0d: optimal
/dev/wd1d: optimal
 No spares.
 Parity status: clean
 Reconstruction is 100% complete.
 Parity Re-write is 100% complete.
 Copyback is 100% complete.
 
 disklabel report the following:
 
 # disklabel -E raid0
 disklabel: Can't get bios geometry: Device not configured
  
  
   
 Initial label editor (enter '?' for help at any prompt)
  p
 device: /dev/rraid0c
 type: RAID
 disk: raid
 label: fictitious
 bytes/sector: 512
 sectors/track: 128
 tracks/cylinder: 8
 sectors/cylinder: 1024
 cylinders: 156417
 total sectors: 160171392
 free sectors: 160171392
 rpm: 3600
  
  
   
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   a: 4358414031416925149  unused  0 0  # Cyl 1383715*-180
 9342*
   c: 160171392 0  unused  0 0  # Cyl 0 -15641
 7*

435841403 + 1416925149 = 1852766552 which is greater than 160171392
by 1692595160.  If you fix the offset of 'a', I suspect things will 
be happier.

Later...

Greg Oster



Re: RAID label problem?

2006-03-22 Thread Greg Oster
Xavier Mertens writes:
 Well, I already tried to create only a small partition:
[snip]
  p
 device: /dev/rraid0c
 type: RAID
 disk: raid
 label: fictitious
 bytes/sector: 512
 sectors/track: 128
 tracks/cylinder: 8
 sectors/cylinder: 1024
 cylinders: 156417
 total sectors: 160171392
 free sectors: 159761792
 rpm: 3600
  
  
   
 16 partitions:
 # sizeoffset  fstype [fsize bsize  cpg]
   a:409600 0  4.2BSD   2048 16384   16 # Cyl 0 -   39
 9
   c: 160171392 0  unused  0 0  # Cyl 0 -15641
 7*
  q
 Write new label?: [y] y
 disklabel: ioctl DIOCWDINFO: No space left on device
 disklabel: unable to write label

What does 'raidctl -s raid0' say?  There are not may places in the 
DIOCWDINFO code path where ENOSPC is returned... but one of them is 
in raidstrategy().

Later...

Greg Oster



Re: RAIDframe parity errors and rebuild

2006-03-19 Thread Greg Oster
David Wilk writes:
 this was exactly my thought.  I was hoping someone would have some
 'official' knowledge, or opinion.  I still can't get over having to
 wait several hours for my root partition to become available after an
 improper shutdown.
 
 On 3/18/06, Joachim Schipper [EMAIL PROTECTED] wrote:
  On Sat, Mar 18, 2006 at 12:59:30PM +0200, Antonios Anastasiadis wrote:
   I had the same question, and just changed the relevant line in /etc/rc
   adding '' in the end:
  
   raidctl -P all 
 
  Then again, why is this not the default? Are you certain this actually
  works?
 
  Joachim

If you want to be 100% paranoid, then you want to wait for the 
'raidctl -P all' to update all parity before starting even fsck's.
There *is* a non-zero chance that the parity might be out-of-sync 
with the data, and should a component die before that parity has been 
updated, then you could end up reading bad data.  This can happen 
even if the filesystem has been checked.  What are the odds of this 
happening?  Pretty small.

If 'raidctl -P all ' is run, then the larger problem is both fsck 
and raidctl will be fighting for disk cycles -- i.e. the fsck will 
take longer to complete.  On more critical systems, this is how I 
typically have things setup (I'm willing to risk it that I'm not 
going to have a disk die during the minutes that it takes to do the 
fsck).

On less critical boxes, I've got a sleep 3600 before the 'raidctl 
-P', so that the parity check doesn't get in the way of the fsck or 
the system coming up... about an hour after it comes up, the disks 
are then checked...

It's one of those what are the odds games... allowing the raidctl 
to run in the background seems to have the right mix of paranoia and 
practicality... 

Later...

Greg Oster



Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28

2006-03-15 Thread Greg Oster
Adam PAPAI writes:
 Hello misc,
 
 I have an IBM xSeries 335 machine with Dual Xeon processor and 2x73GB 
 SCSI Seagate Barracuda 10K rpm disc. I run OpenBSD 3.8 on it.
 
 When I'm creating the raid array (raidctl -iv raid0), I get the 
 following error message:
 
 sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
  SENSE KEY: Media Error
   INFO: 0x224c10c (VALID flag on)
   ASC/ASCQ: Read Retries Exhausted
   SKSV: Actual Retry Count: 63
 raid0: IO Error.  Marking /dev/sd0d as failed.
 raid0: node (Rod) returned fail, rolling backward
 Unable to verify raid1 parity: can't read stripe.
 Could not verify parity.

Is this early in the initialization or late in the initialization?

Try doing:

 dd if=/dev/rsd0d of=/dev/null bs=10m 

and see if you get the same error message...  

Later...

Greg Oster



Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28

2006-03-15 Thread Greg Oster
Adam PAPAI writes:
 Greg Oster wrote:
  Adam PAPAI writes:
  
 Hello misc,
 
 I have an IBM xSeries 335 machine with Dual Xeon processor and 2x73GB 
 SCSI Seagate Barracuda 10K rpm disc. I run OpenBSD 3.8 on it.
 
 When I'm creating the raid array (raidctl -iv raid0), I get the 
 following error message:
 
 sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28
  SENSE KEY: Media Error
   INFO: 0x224c10c (VALID flag on)
   ASC/ASCQ: Read Retries Exhausted
   SKSV: Actual Retry Count: 63
 raid0: IO Error.  Marking /dev/sd0d as failed.
 raid0: node (Rod) returned fail, rolling backward
 Unable to verify raid1 parity: can't read stripe.
 Could not verify parity.
  
  
  Is this early in the initialization or late in the initialization?
  
  Try doing:
  
   dd if=/dev/rsd0d of=/dev/null bs=10m 
  
  and see if you get the same error message...  
 
 
 # dd if=/dev/rsd0d of=/dev/null bs=10m
 6977+1 records in
 6977+1 records out
 73160687104 bytes transferred in 1043.771 secs (70092636 bytes/sec)
 # dd if=/dev/rsd1d of=/dev/null bs=10m
 6977+1 records in
 6977+1 records out
 73160687104 bytes transferred in 1027.051 secs (71233712 bytes/sec)
 #
 
 This means no hdd error..

Well... no hdd error for this set of reads... Hm  What if you 
push both drives at the same time:

 dd if=/dev/rsd0d of=/dev/null bs=10m 
 dd if=/dev/rsd1d of=/dev/null bs=10m 

?   (Were the drives warm when you did this test, and/or when the 
original media errors were reported?  Does a 'raidctl -iv raid0' work 
now or does it still trigger an error? )

 Then probably the raidFrame has the problem I guess..

RAIDframe doesn't know anything about SCSI controllers or SCSI errors... 
all it knows about are whatever VOP_STRATEGY() happens to return to 
it from the underlying driver... 

 I have to use /altroot on /dev/sd1a then, or is there a patch for 
 raidframe to fix this?

There is no patch for RAIDframe to fix this.  There is either a 
problem with the hardware (most likely), some sort of BIOS 
configuration issue (is it negotiating the right speed for the 
drive?), or (less likely) a mpt driver issue.  Once you figure out 
what the real problem is and fix it, RAIDframe will work just fine :) 

Later...

Greg Oster



Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28

2006-03-15 Thread Greg Oster
Adam PAPAI writes:
 After reboot my dmesg end:
 
 rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
 Hosed component: /dev/sd0d.
 raid0: Ignoring /dev/sd0d.
 raid0: Component /dev/sd1d being configured at row: 0 col: 1
   Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
   Version: 2 Serial Number: 100 Mod Counter: 27
   Clean: No Status: 0
 /dev/sd1d is not clean !
 raid0 (root)raid0: no disk label
 raid0: Error re-writing parity!
 
 dd if=/dev/rsd0d of=/dev/null bs=10m 
 dd if=/dev/rsd1d of=/dev/null bs=10m 
 
 was successfully ended.
 
 # raidctl -iv raid0 

wha does 'raidctl -s raid0' say?  It probably says that 'sd0d' is 
failed.  You can't initialize parity with 'raidctl -iv' on a set with 
a failed component.  You can do 'raidctl -vR /dev/sd1d raid0' to get 
it to reconstruct back onto the failed component.  After that you can 
do a 'raidctl -iv' (though by that point it's strictly not necessary).

Later...

Greg Oster



Re: raidFrame creating error: sd0(mpt0:0:0): Check Condition (error 0x70) on opcode 0x28

2006-03-15 Thread Greg Oster
Adam PAPAI writes:
 Greg Oster wrote:
  Adam PAPAI writes:
  
 After reboot my dmesg end:
 
 rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
 Hosed component: /dev/sd0d.
 raid0: Ignoring /dev/sd0d.
 raid0: Component /dev/sd1d being configured at row: 0 col: 1
   Row: 0 Column: 1 Num Rows: 1 Num Columns: 2
   Version: 2 Serial Number: 100 Mod Counter: 27
   Clean: No Status: 0
 /dev/sd1d is not clean !
 raid0 (root)raid0: no disk label
 raid0: Error re-writing parity!
 
 dd if=/dev/rsd0d of=/dev/null bs=10m 
 dd if=/dev/rsd1d of=/dev/null bs=10m 
 
 was successfully ended.
 
 # raidctl -iv raid0 
  
  
  wha does 'raidctl -s raid0' say?  It probably says that 'sd0d' is 
  failed.  You can't initialize parity with 'raidctl -iv' on a set with 
  a failed component.  You can do 'raidctl -vR /dev/sd1d raid0' to get 
  it to reconstruct back onto the failed component.  After that you can 
  do a 'raidctl -iv' (though by that point it's strictly not necessary).
 
 Interesting. I tried with 3 full reinstall and all raidctl -iv raid0 
 fails, but with raidctl -vR /dev/sd0d solved the problem.
 
 But why?

It didn't solve the Media Error... the Media Error just didn't 
show up again.

 Will it be good from now? 

If I had to pick from one of Yes or No, I'd pick No.

 I'm fraid the raid will collapse again. I hope not.
 
 I going to continue the setup on my server. Thanks anyway. I hope I 
 won't get more errors...

I hope so too... but nothing in 'raidctl -vR' really fixes media 
errors...  (Since 'raidctl -R' is going to write to sd0, it's possible 
that the drive has now re-mapped whatever bad block was on sd0, and 
sd0 may work fine now... but it's unusual to see the same error on 
2 different drives... makes me maybe suspect cabling too..)

Later...

Greg Oster



Re: RAIDframe question

2006-02-01 Thread Greg Oster
=?ISO-8859-1?Q?H=E5kan_Olsson?= writes:
 On 1 feb 2006, at 08.38, Jurjen Oskam wrote:
 
  On Wed, Feb 01, 2006 at 01:19:58AM -0500, Peter wrote:
 
  raid0: Device already configured!
  ioctl (RAIDFRAME_CONFIGURE) failed
 
  Can anyone lend a hand in this important matter?
 
  Let me guess (since you didn't post any configuration): you
  enabled RAID-autoconfiguration by the kernel *and* you
  configure the same RAID-device during the boot sequence using
  raidctl?
 
 /etc/rc includes commands to configure the raid devices, and if  
 they've been setup to use autoconfiguration then this is indeed what  
 happens. Expected and nothing to worry about, although noisy.

What he said.

 For my  
 raidframe devices, I just removed the autoconfigure flag.

Please use the autoconfigure flag.  It is *far* better at gluing 
together a RAID set than the regular configuration bits, especially 
in the face of drives that move about or drives that fail to spin 
up... (the old config code needs to find its way into a bit-bucket..)

You really want to use the autoconfigure bits.. :)  Really. :) 

Later...

Greg Oster



Re: RAIDframe question

2006-02-01 Thread Greg Oster
Peter writes:

 I tried unsuccessfully using the same procedure to set up two disks (sd0
 and sd1) attached to a QLogic FibreChannel controller (isp driver).  I
 probably don't have the correct terminology but upon startup the boot code
 could not be found (would not get beyond the point where the kernel
 usually kicks in).  I'm wondering whether RAIDframe has limitations with
 this hardware.

RAIDframe doesn't care about underlying hardware.  It's run on top of 
a) probably every flavour of SCSI, b) various levels of IDE/pciide,
c) FibreChannel, d) ancient things like HP-IB, and e) other RAIDframe
devices.  If the underlying device can provide something that looks/
smells like a disk partition, that's good enough for RAIDframe.

Later...

Greg Oster



Re: RAIDframe question

2006-02-01 Thread Greg Oster
Peter Fraser writes:
 I had a disk drive fail while running RAIDframe.
 The system did not survive the failure. Even worse
 there was data loss.

Ow.  

 The system was to be my new web server. The system
 had 1 Gig of memory.  I was working, slowly, on
 configuring apache and web pages. Moving to
 a chroot'ed environment was none trivial.
 
 The disk drive died, the system crashed, 

Oh so it *wasn't* just a simple case of a drive dying, but the 
system crashed too...  Well, RAIDframe can't make any guarantees when 
there's a system crash -- if buffers havn't been flushed or there's 
still pending meta-data to be written, there's not much RAIDframe can 
do about that... those are filesystem issues.

 and the
 system rebooted and came up. Remove the
 dead disk and replacing it with a new disk
 and reestablishing the raid was no problem.
 
 But why was there a crash, I would of thought
 that the system should run after a disk failure.

You havn't said what types of disks.  I've had IDE disks fail that 
take down the entire system.  I've had IDE disks fail but the system 
remains up and happy.  I've had SCSI disks fail that have made the 
SCSI cards *very* unhappy (and had the system die shortly after).  
None of these things can be solved by RAIDframe -- if the underlying 
device drivers can't deal in the face of lossage, RAIDframe can't 
do anything about that...

You also havn't given any indication as to the nature of the crash, 
or what the panic message was (if any).  (e.g. was it a null-pointer 
dereference, or a corrupted filesystem or something that went wrong 
in the network stack?)

 And even more to my surprise, about two days
 of my work disappeared.

Of course, you just went to your backups to get that back, right? :)

 I believe, the disk drive died about 2 days before
 the crash. I also believe that RAIDframe did
 not handle the disk drive's failure correctly

Do you have a dmesg related to the drive failure?  e.g. something 
that shows RAIDframe complaining that something was wrong, and 
marking the drive as failed?  

 and as a result all file writes to the failed
 drive queued up in memory,

I've never seen that behaviour...  I find it hard to believe that 
you'd be able to queue up 2 days worth of writes without a) any reads 
being done or b) not noticing that the filesystem was completely 
unresponsive when a write of associated meta-data never returned...  
(on the first write of meta-data that didn't return, pretty much all
IO to that filesystem should grind to a halt.  Sorry.. I'm not buying 
the it queued up things for two days... )

 when memory ran out the system crashed. 
 
 I don't know enough about OpenBSD internals to
 know if my guess as to what happened is correct,
 but it did worry me about the reliability of
 RAIDframe.

I've been running RAIDframe (albeit not w/ OpenBSD) in both 
production and non-production environments now for 7+ years...  
RAIDframe reliability is the least of my worries :) 
(RAIDframe has also saved mine and others' data on various occasions 
over the years...)
 
 I am now trying ccd for my web pages and 
 ALTROOT in daily for root, I have not had a disk
 fail with ccd yet, so I have not determined whether
 ccd works better.

Good luck.  (see a different thread for my thoughts on using ccd :)

 
 Neither RAIDframe or ccd seems to be up the
 quality of nearly all the other software
 in OpenBSD. This statement is also true of the documentation.

My only comment on that is that the version of RAIDframe in OpenBSD 
is somewhat dated.  You are also encouraged to find and read the 
latest versions of the documentation, and to provide feedback
to the author on what you feel is lacking.

Later...

Greg Oster



Re: RAIDframe question

2006-02-01 Thread Greg Oster
Andy Hayward writes:
 On 2/1/06, Greg Oster [EMAIL PROTECTED] wrote:
  Peter Fraser writes:
   and as a result all file writes to the failed
   drive queued up in memory,
 
  I've never seen that behaviour...  I find it hard to believe that
  you'd be able to queue up 2 days worth of writes without a) any reads
  being done or b) not noticing that the filesystem was completely
  unresponsive when a write of associated meta-data never returned...
  (on the first write of meta-data that didn't return, pretty much all
  IO to that filesystem should grind to a halt.  Sorry.. I'm not buying
  the it queued up things for two days... )
 
 I've seem similar on a machine with a filesystem on a raid-1 partition
 and mounted with softdeps enabled. From what I remember the scenario
 was something like:
 
 * copied 10Gb or so of data to new raid-1 filesystem
 * system then left idle for 30mins or so
 * being an idiot, pulled the wrong plug out of the wall
 * upon reboot, and after raid resync and fsck, most of the copied data
 was no longer there

RAIDframe can only write what it's given.  If, after 30 minutes,
the filesystem layers havn't synced all the data, RAIDframe can't 
do anything about that...  if left idle for 30 minutes, that 
filesystem should have synced itself many times over, to the point 
that fsck shouldn't have found anything to complain about... 

(I strongly suspect you'd see exactly the same behaviour without 
RAIDframe involved here...  I also suspect you wouldn't see the same 
behavior without softdeps, RAIDframe or not.)

Later...

Greg Oster



Re: RAIDframe issues on 3.8

2005-12-07 Thread Greg Oster
Dave Diller writes:
 
 Here's what's been changed in the kernel not a lot. I don't understand why
 it would panic on a simple reconstruct command.

I might be able to understand, but my crystal ball is at the 
cleaners, and I can't guess what your panic message looked like, nor 
what the traceback was. 

 Here's the current status of the RAID:
[snip]
 I can't fix the parity either now, it fails on both a -i and a -P attempt.

Right.  The RAID set only has one good component.  There's nothing to 
rebuild parity onto with just one good component.

 wd2a's component label looks fine:
[snip]
 wd1a's does not... pretty much every bit of data that could be changed, has
 been somehow.  I suspect this is part of the root reason that raid0 can't dea
 l
 with it, but I can't seem to get it to reinitialize correctly either.
 
 bash-3.00# raidctl -g /dev/wd1a raid0
 Component label for /dev/wd1a:
Row: 16, Column: 24, Num Rows: 1312, Num Columns: 16
Version: 0, Serial Number: 0, Mod Counter: 8
Clean: Yes, Status: 1133920558
sectPerSU: 9775536, SUsPerPU: 9620351, SUsPerRU: 119
Queue size: 2048, blocksize: 8, numBlocks: 5
RAID Level:
Autoconfig: Yes
Root partition: Yes
Last configured as: raid256

Hmmm... I don't understand this... the label should be the same 
as the other, sans the Column and Mod Counter fields. (and possibly 
Clean and Status).

[snip]

 RAIDFRAME: protectedSectors is 64.
 Hosed component: /dev/wd1a.
 Hosed component: /dev/wd1a.
 raid0: Component /dev/wd2a being configured at row: 0 col: 0
  Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
  Version: 2 Serial Number: 100 Mod Counter: 183
  Clean: No Status: 0
 /dev/wd2a is not clean !
 raid0: Ignoring /dev/wd1a.
 RAIDFRAME(RAID Level 1): Using 6 floating recon bufs with no head sep limit.
 raid0 (root)raid0: Error re-writing parity!
 
 
 I agree it's 'hosed', just looking at the component label!  Nice message. :)
 
 If I umount the partition and try to -u(nconfigure) raid0, it kernel panics.

That shouldn't happen either.

 If I -R(econstruct), it kernel panics.I've messed it up but good.  
 How do I
 reinitialize wd1a and/or raid0 and/or start over completely?

You'll have to boot without /etc/raid0.conf.  You can then re-do 
the config with -C and then -I again, but that won't help you
when a disk fails and you get the panic messages on trying to 
reconstruct.  (raidctl -R should *not* panic... (in the version
of RAIDframe you're running, it will it encounters a read error
or a write error while doing the reconstruct, but that's probably a 
different problem)).

Oh... and from the obivous bugs department:  In rf_openbsdkintf.c
 
 case RAIDFRAME_GET_COMPONENT_LABEL:

there is a:

 RF_Free( clabel, sizeof(RF_ComponentLabel_t));

missing before the:

 return(EINVAL);

But that won't help with the problem your describing... (just noticed 
the above as I was perusing the code..)

Later...

Greg Oster



Re: RAIDframe issues on 3.8

2005-12-07 Thread Greg Oster
Dave Diller writes:
 
 
  Oh... and from the obivous bugs department:  In rf_openbsdkintf.c
 
   case RAIDFRAME_GET_COMPONENT_LABEL:
 
  there is a:
 
   RF_Free( clabel, sizeof(RF_ComponentLabel_t));
 
  missing before the:
 
   return(EINVAL);
 
  But that won't help with the problem your describing... (just noticed
  the above as I was perusing the code..)
 
 Ha! Guess it helps when you wrote the original version, eh?  Nice.

Well... 7 years ago now for some of these bits :)  (and yes, this bug 
is entirely mine :) )

 It definitely seems to be related to issues with rf_openbsdkintf.c though - I
 was just pointed to this bug by the gentleman who opened it a couple of months
 ago:
 
 http://cvs.openbsd.org/cgi-bin/query-pr-wrapper?full=yesnumbers=4508

Ahh.. that one.

 which has the same panic that I'm seeing.  Sorry for not including it
 initially, BTW.  Didn't have an easy way to do that since I'm remote with no
 console.
 
 Resolution was
 
 State-Changed-Why:
 Fixed in revision 1.28 of rf_openbsdkintf.c, thanks for the report
 
 and I'm running
 
 /* $OpenBSD: rf_openbsdkintf.c,v 1.27 2004/11/28 02:47:14 pedro Exp $   */
 
 So, time to resolve that via the latest -stable and try again.

Yup.

 Do you have the cycles to get a bug in queue for the one you spotted on 
 a quick once-over, before someone gets nailed by THAT one?  I could open 
 it, but it
 would merely say didn't run into the problem, but Greg Oster says its an
 obvious bug... ;-)

I mentioned it here since it's an easy one for someone to fix...  You 
can file a problem report if you'd like, but I don't want to get 
started filing PR's for RAIDframe stuff in OpenBSD -- there have been 
a lot of changes/fixes to RAIDframe in the last 5 years that aren't 
reflected in the code in OpenBSD, and I wouldn't know where to begin 
:)

Later...

Greg Oster



Re: Updated CCD Mirroring HOWTO

2005-12-02 Thread Greg Oster
Nick Holland writes:
 Greg Oster wrote:
 ...
  Here's what I'd encourage you (or anyone else) to do:
 
 actually, I'd encourage you do try your own test.  Results were interesting.

Well... as we see, you did *your* version of the test, not mine ;) 

  1) Create a ccd as you describe in the HOWTO and mount the filesystem.
 
 used my own instructions, if you don't mind. :)
 Softdeps on.  That may matter.  Or it may not.  Not sure.

Shouldn't be a big deal either way..

  2) Start extracting 5 copies of src.tar.gz onto the filesystem (
  simultanously is preferred, but basically anything that will generate 
  a lot of IO here is what is needed).
 
 I wussed out here.  Did one unpacking of a Maildir in a .tgz file.  But
 lots of IO, lots of thrashing, disks were basically saturated with work,
 processor was waiting for disk.  Lots of tiny files.  On the other hand,
 that's a lot more activity than this machine will ever see in production.

Um... that's just one thread of IO... 64K (or whatever MAXPHYS is) 
presented, in sequence, to the underlying driver.  A rather boring 
sequence of IO, with not much chance for one disk to get ahead or 
behind the other in terms of servicing requests.  The 5 was there 
for a reason :)  So, actually, was src.tar.gz.  To make things more 
interesting, do a whole mess of reads from the ccd while you're 
doing the 5 extractions (preferably for something that isn't cached). 
(If I were testing this on my machine, I'd likely start with 10 
different copies of src.tar.gz on the ccd, and then extract all 
10 simultanous (to different destinationson the ccd).  Once that 
was going, I'd then start about 50 dd's of the src.tar.gz files,
each dd starting about 10 seconds after the previous.   When all 
IO had begun, I'd wait a few minutes and *then* pull the rug out 
from the system.  But I didn't expect anyone to push their system 
that hard for this test, and so went with 5, and just one copy of 
src.tar.gz in an unspecified location :) )

 My first (and second) test was copying the 86M .tgz file, but that was
 horribly uninteresting.  Resetting the machine well into the copy
 resulted in a zero-byte file after fsck.  Truncated.  Not a big
 surprise, really.
 
  3) After that's been going for a while, and while still in progress, 
  pull the power from the machine.
 
 Drop power mid write, you are risking your disk.  Yes, I have spiked
 disks with a nail gun to test RAID in the past, but didn't feel like
 possibly toasting two disks by powering down the machine mid-write at
 this time.  This system has purpose for me. :)

Heh.. my RAID test box has a disk in external case.. disk 'failure' 
is simulated by powering off that case... I don't know how many power 
outages that poor little disk has seen :) 
 
 So, I hit the reset button on the machine.  That should give something
 similar to (though admittedly, not identical to) a crash.

Yes, should suffice for this test ...

 No, hitting the reset is NOT the same as a power outage.  It isn't the
 same as a crash either -- in the later case, I'm going to say that it is
 just different, not easier or harder...so my test is only one kind of
 failure (and I REALLY didn't feel like pulling a memory module out to
 simulate a HW failure... :)
 
  4) Fire the machine back up, configure the ccd again, and run fsck a 
 few times to make sure the ccd filesystem is clean.
 
 once did the job.  Second fsck came up clean.  Don't expect different
 results on the third or fourth...
 
  5) Now unconfigure the ccd.
 
 mounted each separately as a non-mirrored ccd file system.
 
  6) Do an md5 checksum of each of the parts of the mirror, and see if 
  they differ.  (they shouldn't, but I bet the do!!)
 
 I think the md5 test of the mirror elements is bogus here.
 I don't care if an unallocated block is different. I care if the files
 are different.  I might not even care about that much.  See below...

Umm There is still a non-zero chance that metadata on one disk 
will be different than metadata on the other, or that data on one 
disk will be different than the other...

  If they differ, tell me how ccd detected that difference, and how it 
  warned you that if the primary drive died that you'd have incorrect 
  data.  If they don't differ, go buy a lottery ticket, cause it's
  your lucky day! ;) 
 
 I used diff(1) to compare the two trees created by splitting the mirror.
 
 No difference found.  i.e., ccd(4) mirroring passed a somewhat
 simplified version of your test.  I even modified one of the files to
 make sure I didn't blow the diff command usage...  188M of files in the
 tree, no differences.
 
 I will admit I was pleasantly surprised, though not totally shocked that
 it did.

With only one IO thread, I'm not overly surprised with these results...
 
 My first clue was what happened when I tried to interrupt the copy of a
 single very large file to the ccd(4) file system.  Even though many
 megabytes had been transfered, by the time fsck

Re: Updated CCD Mirroring HOWTO

2005-11-29 Thread Greg Oster
 right is sometimes a bit more work... :)

 CCD is easy to set up (once you figure 
 out the steps) and I think it provides some protection against harddisk 
 failures.

There is *some* protection, provided one can guarantee the mirrors 
are in-sync at ccd configuration time. 

Here's what I'd encourage you (or anyone else) to do:

1) Create a ccd as you describe in the HOWTO and mount the filesystem.
2) Start extracting 5 copies of src.tar.gz onto the filesystem (
simultanously is preferred, but basically anything that will generate 
a lot of IO here is what is needed).
3) After that's been going for a while, and while still in progress, 
pull the power from the machine.
4) Fire the machine back up, configure the ccd again, and run fsck a 
   few times to make sure the ccd filesystem is clean.
5) Now unconfigure the ccd.
6) Do an md5 checksum of each of the parts of the mirror, and see if 
they differ.  (they shouldn't, but I bet the do!!)

If they differ, tell me how ccd detected that difference, and how it 
warned you that if the primary drive died that you'd have incorrect 
data.  If they don't differ, go buy a lottery ticket, cause it's
your lucky day! ;) 

Later...

Greg Oster