Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)

Al Hopper Sat, 02 Dec 2006 09:52:35 -0800

On Sat, 2 Dec 2006, Chad Leigh -- Shire.Net LLC wrote:

>
> On Dec 2, 2006, at 12:06 AM, Ian Collins wrote:
>
> > Chad Leigh -- Shire.Net LLC wrote:
> >
> >>
> >> On Dec 1, 2006, at 10:17 PM, Ian Collins wrote:
> >>
> >>> Chad Leigh -- Shire.Net LLC wrote:
> >>>
> >>>> There is not?  People buy disk drives and expect them to corrupt
> >>>> their data?  I expect the drives I buy to work fine (knowing that
> >>>> there could be bugs etc in them, the same as with my RAID systems).
> >>>>
> >>> So you trust your important data to a single drive?  I doubt
> >>> it.   But I
> >>> bet you do trust your data to a hardware RAID array.
> >>
> >>
> >> Yes, but not because I expect a single drive to be more error prone
> >> (versus total failure).  Total drive failure on a single disk loses
> >> all your data.  But we are not talking total failure, we are talking
> >> errors that corrupt data.  I buy individual drives with the
> >> expectation that they are designed to be error free and are error
> >> free for the most part and I do not expect a RAID array to be more
> >> robust in this regard (after all, the RAID is made up of a bunch of
> >> single drives).
> >>
> > But people expect RAID to protect them from the corruption caused by a
> > partial failure, say a bad block, which is a common failure mode.
>
> They do?  I must admit no experience with the big standalone raid
> array storage units, just (expensive) HW raid cards, but I have never
> expected an array to protect me against data corruption.  Bad blocks
> can be detected and remapped, and maybe the array can recalculate the
> block from parity etc, but that is a known disk error, and not the
> subtle kinds of errors created by the RAID array that are being
> claimed here.
>
> >   The
> > worst system failure I experienced was caused by one half of a mirror
> > experiencing bad blocks and the corrupt data being nicely mirrored on
> > the other drive.  ZFS would have saved this system from failure.
>
> None of my comments are meant to denigrate ZFS.  I am implementing it
> myself.
>
> >
> >> Some people on this list think that the RAID arrays are more likely
> >> to corrupt your data than JBOD (both with ZFS on top, for example, a
> >> ZFS mirror of 2 raid arrays or a JBOD mirror or raidz).  There is no
> >> proof of this or even reasonable hypothetical explanation for this
> >> that I have seen presented.
> >>
> > I don't think that the issue here, it's more one of perceived data
> > integrity.  People who have been happily using a single RAID 5 are now
> > finding that the array has been silently corrupting their data.
>
> They are?  They are being told that the problems they are having is
> due to that but there is no proof.  It could be a bad driver for
> example.
>
> > People
> > expect errors form single drives,
>
> They do?  The tech specs show very low failure rates for single
> drives in terms of bit errors.
>
> > so they put them in a RAID knowing the
> > firmware will protect them from drive errors.
>
> The RAID firmware will not protect them from bit errors on block
> reads unless the disk detects that the whole block is bad.  I admit
> not knowing how much the disk itself can detect bit errors with CRC
> or similar sorts of things.


This is incorrect.  Lets take a simple example of a H/W RAID5 with 4 disk
drives.  If disk 1 returns a bad block when a stripe of data is read (and
does not indicate an error condition), the RAID firmware will calculate
the parity/CRC for the entire stripe (as it *always* does) and "see" that
that there is an error present and transparently correct the error, before
returning the corrected data upstream to the application (server).  It
can't correct every possible error - there will be limits depending on
which CRC algorithms are implemented and the extend of the faulty data.
But, in general, those algorithms, if correctly chosen and implemented,
will correct most errors, most of the time.

The main reason why not *all* the possible errors can be corrected, is
because there are compromises to be made in:

- the number of bits of CRC that will be calculated and stored
- the CPU and memory resources required to perform the CRC calculations
- limitations in the architecture of the RAID h/w, for example, how much
bandwidth is available between the CPU, memory, disk I/O controllers and
what level of bus contention can be tolerated
- whether the RAID vendor wishes to make any money (hardware costs must be
minimized)
- whether the RAID vendor wishes to win benchmarking comparisons with
their competition
- how smart the firmware developers are and how much pressure is put on
them to get the product to market
- blah, blah, blah

> > They often fail to
> > recognise that the RAID firmware may not be perfect.
>
> ZFS, JBOS disk controllers, drivers for said disk controllers, etc
> may not be perfect either.
>
> >
> > ZFS looks to be the perfect tool for mirroring hardware RAID arrays,
> > with the advantage over other schemes of knowing which side of the
> > mirror has an error.  Thus ZFS can be used as a tool to compliment,
> > rather than replace hardware RAID.
>
> I agree.  That is what I am doing :-)
>
> Chad
>
> >
> > Ian
> >
>
> ---
> Chad Leigh -- Shire.Net LLC
> Your Web App and Email hosting provider
> chad at shire.net
>
>
>
>

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
           Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
             OpenSolaris Governing Board (OGB) Member - Feb 2006
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Re: Production ZFS Server Death (06/06)

Reply via email to