On Tue, 2006-10-10 at 07:33 +1000, Neil Brown wrote: > On Monday October 9, [EMAIL PROTECTED] wrote: > > > > The original email was about raid1 and the fact that reads from > > different disks could return different data. > > To be fair, the original mail didn't mention "raid1" at all. It did > mention raid5 and raid6 as a possible contrast so you could reasonably > get the impression that it was talking about raid1. But that wasn't > stated.
OK, well I got that impression from the contrast ;-)
> Otherwise I agree. There is no real need to perform the sync of a
> raid1 at creation.
> However it seems to be a good idea to regularly 'check' an array to
> make sure that all blocks on all disks get read to find sleeping bad
> blocks early. If you didn't sync first, then every check will find
> lots of errors. Ofcourse you could 'repair' instead of 'check'. Or
> do that once. Or something.
>
> For raid6 it is also safe to not sync first, though with the same
> caveat as raid1. Raid6 always updates parity by reading all blocks in
> the stripe that aren't known and calculating P and Q. So the first
> write to a stripe will make P and Q correct for that stripe.
> This is current behaviour. I don't think I can guarantee it will
> never changed.
>
> For raid5 it is NOT safe to skip the initial sync. It is possible for
> all updates to be "read-modify-write" updates which assume the parity
> is correct. If it is wrong, it stays wrong. Then when you lose a
> drive, the parity blocks are wrong so the data you recover using them
> is wrong.
superblock->init_flag == FALSE then make all writes a parity generating
not updating write (less efficient, so you would want to resync the
array and clear this up soon, but possible).
> In summary, it is safe to use --assume-clean on a raid1 or raid1o,
> though I would recommend a "repair" before too long. For other raid
> levels it is best avoided.
>
> >
> > Probably the best thing to do would be on create of the array, setup a
> > large all 0 block of mem and repeatedly write that to all blocks in the
> > array devices except parity blocks and use a large all 1 block for that.
>
> No, you would want 0 for the parity block too. 0 + 0 = 0.
Sorry, I was thinking odd parity.
> > Then you could just write the entire array at blinding speed. You could
> > call that the "quick-init" option or something. You wouldn't be able to
> > use the array until it was done, but it would be quick.
>
> I doubt you would notice it being faster than the current
> resync/recovery that happens on creation. We go at device-speed -
> either the buss device or the storage device depending on which is
> slower.
There's memory overhead though. That can impact other operations the
cpu might do while in the process of recovering.
>
> > If you wanted
> > to be *really* fast, at least for SCSI drives you could write one large
> > chunk of 0's and one large chunk of 1's at the first parity block, then
> > use the SCSI COPY command to copy the 0 chunk everywhere it needs to go,
> > and likewise for the parity chunk, and avoid transferring the data over
> > the SCSI bus more than once.
>
> Yes, that might be measurably faster. It is the sort of thing you might
> do in a "hardware" RAID controller but I doubt it would ever get done
> in md (there is a price for being very general).
Bleh...sometimes I really dislike always making things cater to the
lowest common denominator...you're never as good as you could be and you
are always as bad as the worst case...
--
Doug Ledford <[EMAIL PROTECTED]>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
signature.asc
Description: This is a digitally signed message part
