----- Original Message -----
From: Brian Wilson <[EMAIL PROTECTED]>
Date: Saturday, June 14, 2008 12:12 pm
Subject: Re: [zfs-discuss] zpool with RAID-5 from intelligent storage arrays
To: Bob Friesenhahn <[EMAIL PROTECTED]>
Cc: zfs-discuss@opensolaris.org


> > On Sat, 14 Jun 2008, zfsmonk wrote:
> > 
> > > Mentioned on 
> > > 
> > 
> http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide 
> 
> > 
> > > is the following: "ZFS works well with storage based protected 
> LUNs 
> > 
> > > (RAID-5 or mirrored LUNs from intelligent storage arrays). 
> However, 
> > 
> > > ZFS cannot heal corrupted blocks that are detected by ZFS 
> > > checksums."
> > 
> > This basically means that the checksum itself is not sufficient to 
> > accomplish correction.  However if ZFS-level RAID is used, the 
> correct 
> > 
> > block can be obtained from a redundant copy.
> > 
> > > based upon that, if we have LUNs already in RAID5 being served 
> from 
> > 
> > > intelligent storage arrays, is it any benefit to create the zpool 
> in 
> > 
> > > a mirror if zfs can't heal any corrupted blocks? Or would we just 
> be 
> > 
> > > wasting disk space?
> > 
> > This is a matter of opinion.  If ZFS does not have access to 
> > redundancy then it can not correct any problems that it encounters, 
> 
> > and could even panic the system or the entire pool could be lost. 
> > However, if the storage array and all associated drivers, adaptors, 
> 
> > memory, and links are working correctly, then this risk may be 
> > acceptable (to you).
> > 
> > ZFS experts at Sun say that even the best storage arrays may not 
> > detect and correct some problems and that complex systems can 
> produce 
> > 
> > errors even though all of their components seem to be working 
> > correctly.  This is in spite of Sun also making a living by selling 
> 
> > such products.  The storage array is only able to correct errors it 
> 
> > detects due to the hardware reporting an unrecoverable error 
> condition 
> > 
> > or by double-checking using data on a different drive.  Since 
> storage 
> > 
> > arrays want to be fast they are likely to engage additional validity 
> 
> > checks/correction only after a problem has already been reported (or 
> 
> > during a scrub/resilver) rather than as a matter of course.
> > 
> > A problem which may occur is that your storage array may say that 
> the 
> > 
> > data is good while ZFS says that there is bad data.  Under these 
> > conditions there might not be a reasonable way to correct the 
> problem 
> > 
> > other than to lose the data.  If the zfs pool requires the failed 
> data 
> > 
> > in order to operate, then the entire pool could be lost.
> > 
> 
> Couple of questions on this topic - 
> 
> What's the percent of data in a zpool that if it gets one of these bit 
> corruption errors, will actually cause the zpool to fail?  Is it a 
> higher/lower percent than what it would take to fatally and 
> irrevocably corrupt UFS, or VxFS to the point where a restore is 
> required? 
> 
> Given that today's storage arrays catch a good percentage of errors 
> and correct them (for the intelligent arrays I have in mind anyway), 
> is we're talking about the nasty, silent corruption I've been reading 
> about that occurs in huge datasets where the RAID thinks it's good, 
> but it's actually garbage?  From what I remember reading, that's an 
> low occurrence rate and only became noticeable because we're dealing 
> in such large amounts of data these days.  Am I wrong here?
> 
> So, looking at making operational decisions in the short term, I have 
> to ask specifically.  Is it more or less likely that a zpool will die 
> and have to be restored than UFS or VxFS filesystems on a VxVM volume?
> 

To put it specifically -
I have currently got a volume (a bunch of them) that's on one intelligent 
array, on UFS or VxFS on VxVM volumes.  
I'm not intending at this point if I use ZFS to mirror it to another array 
(which may or may not exist) and double my use of expensive disk.  So, that 
puts me at this risk described here, where the zpool could go poof.

What are the odds, in that configuration of zpool (no mirroring, just using the 
intelligent disk as concatenated luns in the zpool) that if we have this silent 
corruption, the whole zpool dies?
If anyone knows, what's the comparative odds of the VxVM volume, UFS or VxFS 
filesystem similarly dying in the same scenario?

Thanks!
Brian


> My opinions and questions are my own, and do not necessarily represent 
> those of my employer. (or my coworkers, or anyone else)
> 
> cheers,
> Brian
> 
> > Bob
> > ======================================
> > Bob Friesenhahn
> > [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
> > GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
> > 
> > _______________________________________________
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to