Bob Friesenhahn wrote:
> On Mon, 28 Jul 2008, BG wrote:
>
>   
>> indeed that's one of the nice things that ZFS is picky on data and 
>> allerts you immediatly. Before some files became corrupt and one was 
>> wondering what happend and how this was possible since everything 
>> seems fine for months :)
>>     
>
> Unfortunately, ZFS does not detect or correct memory errors.  Memory 
> reliability is currently an Achilles' heel for ZFS, which blows MTTDL 
> models which are based on disk media reliability alone.
>   

We can (and do) model systems complete with the data path from
CPU to memory to PCI* to HBA to disk and back.  Basically, the
results will show that you want ECC memory and PCI-Express as
major technology components.  FWIW, Sun no longer sells computers
without ECC memory.

But ZFS can do better.  I filed CR6674679 which basically says
that if redundant copies of data have the same, wrong checksum,
then ZFS should issue an e-report to that effect.  This will allow
you to move suspicion away from the disks as a root cause towards
a  common cause, like memory, shared HBA or bus, etc. It won't
be able to recover the data, but it can help debug the system.
 -- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to