On Mon, 28 Jul 2008, BG wrote:

> indeed that's one of the nice things that ZFS is picky on data and 
> allerts you immediatly. Before some files became corrupt and one was 
> wondering what happend and how this was possible since everything 
> seems fine for months :)

Unfortunately, ZFS does not detect or correct memory errors.  Memory 
reliability is currently an Achilles' heel for ZFS, which blows MTTDL 
models which are based on disk media reliability alone.

Consider that in servers, the ZFS ARC (containing a copy of 
often-accessed data) will often grow to consume most of the system 
RAM.  This growth mostly occurs after server daemons have been 
successfully started.  Large servers can include lots of RAM, most of 
which is used for caching.  All of the data read or written passes 
through RAM.  Any error which corrupts data before it has been 
checksummed by ZFS will cause silent data corruption on disk.  If data 
in RAM becomes corrupt between the time that it is checksummed and it 
is written to disk, then ZFS will detect the problem, but the data 
will be corrupt and unrecoverable.  Likewise, if the ZFS ARC returns 
corrupted data to an application, the application may then write a 
(possibly) modified version of this corrupted data to disk.

Reliable memory is imperative and without ECC, memory read errors can 
go undetected for a long time.  Even with ECC it is possible to 
experience undetected/uncorrected memory errors but Solaris includes a 
very good fault management system for ECC memory so that it avoids 
memory chips which produce many detectable read errors.

Bob
======================================
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to