On Mon, 28 Jul 2008, BG wrote: > indeed that's one of the nice things that ZFS is picky on data and > allerts you immediatly. Before some files became corrupt and one was > wondering what happend and how this was possible since everything > seems fine for months :)
Unfortunately, ZFS does not detect or correct memory errors. Memory reliability is currently an Achilles' heel for ZFS, which blows MTTDL models which are based on disk media reliability alone. Consider that in servers, the ZFS ARC (containing a copy of often-accessed data) will often grow to consume most of the system RAM. This growth mostly occurs after server daemons have been successfully started. Large servers can include lots of RAM, most of which is used for caching. All of the data read or written passes through RAM. Any error which corrupts data before it has been checksummed by ZFS will cause silent data corruption on disk. If data in RAM becomes corrupt between the time that it is checksummed and it is written to disk, then ZFS will detect the problem, but the data will be corrupt and unrecoverable. Likewise, if the ZFS ARC returns corrupted data to an application, the application may then write a (possibly) modified version of this corrupted data to disk. Reliable memory is imperative and without ECC, memory read errors can go undetected for a long time. Even with ECC it is possible to experience undetected/uncorrected memory errors but Solaris includes a very good fault management system for ECC memory so that it avoids memory chips which produce many detectable read errors. Bob ====================================== Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss