Re: [zfs-discuss] Yager on ZFS

Jonathan Edwards Wed, 05 Dec 2007 21:03:24 -0800

apologies in advance for prolonging this thread .. i had considered  
taking this completely offline, but thought of a few people at least  
who might find this discussion somewhat interesting .. at the least i  
haven't seen any mention of Merkle trees yet as the nerd in me yearns  
for

On Dec 5, 2007, at 19:42, bill todd - aka can you guess? wrote:

>> what are you terming as "ZFS' incremental risk reduction"? ..  
>> (seems like a leading statement toward a particular assumption)
>
> Primarily its checksumming features, since other open source  
> solutions support simple disk scrubbing (which given its ability to  
> catch most deteriorating disk sectors before they become unreadable  
> probably has a greater effect on reliability than checksums in any  
> environment where the hardware hasn't been slapped together so  
> sloppily that connections are flaky).

ah .. okay - at first reading "incremental risk reduction" seems to  
imply an incomplete approach to risk .. putting various creators and  
marketing organizations pride issues aside for a moment, as a  
complete risk reduction - nor should it billed as such.  However i do  
believe that an interesting use of the merkle tree with a sha256 hash  
is somewhat of an improvement over conventional volume based data  
scrubbing techniques since there can be a unique integration between  
the hash tree for the filesystem block layout and a hierarchical data  
validation method.  In addition to the finding unknown areas with the  
scrub, you're also doing relatively inexpensive data validation  
checks on every read.

> Aside from the problems that scrubbing handles (and you need  
> scrubbing even if you have checksums, because scrubbing is what  
> helps you *avoid* data loss rather than just discover it after it's  
> too late to do anything about it), and aside from problems deriving  
> from sloppy assembly (which tend to become obvious fairly quickly,  
> though it's certainly possible for some to be more subtle),  
> checksums primarily catch things like bugs in storage firmware and  
> otherwise undetected disk read errors (which occur orders of  
> magnitude less frequently than uncorrectable read errors).

sure - we've seen many transport errors, as well as firmware  
implementation errors .. in fact with many arrays we've seen data  
corruption issues with the scrub (particularly if the checksum is  
singly stored along with the data block) -  just like spam you really  
want to eliminate false positives that could indicate corruption  
where there isn't any.  if you take some time to read the on disk  
format for ZFS you'll see that there's a tradeoff that's done in  
favor of storing more checksums in many different areas instead of  
making more room for direct block pointers.

> Robert Milkowski cited some sobering evidence that mid-range arrays  
> may have non-negligible firmware problems that ZFS could often  
> catch, but a) those are hardly 'consumer' products (to address that  
> sub-thread, which I think is what applies in Stefano's case) and b)  
> ZFS's claimed attraction for higher-end (corporate) use is its  
> ability to *eliminate* the need for such products (hence its  
> ability to catch their bugs would not apply - though I can  
> understand why people who needed to use them anyway might like to  
> have ZFS's integrity checks along for the ride, especially when  
> using less-than-fully-mature firmware).

actually on this list we've seen a number of consumer level products  
including sata controllers, and raid cards (which are also becoming  
more commonplace in the consumer realm) that can be confirmed to  
throw data errors.  Code maturity issues aside, there aren't very  
many array vendors that are open-sourcing their array firmware - and  
if you consider zfs as a feature-set that could function as a multi- 
purpose storage array (systems are cheap) - i find it refreshing that  
everything that's being done under the covers is really out in the open.

> And otherwise undetected disk errors occur with negligible  
> frequency compared with software errors that can silently trash  
> your data in ZFS cache or in application buffers (especially in PC  
> environments:  enterprise software at least tends to be more stable  
> and more carefully controlled - not to mention their typical use of  
> ECC RAM).
>
> So depending upon ZFS's checksums to protect your data in most PC  
> environments is sort of like leaving on a vacation and locking and  
> bolting the back door of your house while leaving the front door  
> wide open:  yes, a burglar is less likely to enter by the back  
> door, but thinking that the extra bolt there made you much safer is  
> likely foolish.

granted - it's not an all-in-one solution, but by combining the  
merkle tree approach with the sha256 checksum along with periodic  
data scrubbing - it's a darn good approach .. particularly since it  
also tends to cost a lot less than what you might have to pay  
elsewhere for something you can't really see inside.

> Conversely, if you don't care enough about your data to take those  
> extra steps, then adding ZFS's incremental protection won't reduce  
> your net risk by a significant percentage (because the other risks  
> that still remain are so much larger).
>
> Was my point really that unclear before?  It seems as if this must  
> be at least the third or fourth time that I've explained it.

not at all, disasters happen in many ways and forms and one must put  
in place strategies and protections to deal with as much as you can  
see - granted you can never cover all your bases and disasters can  
always find their way through .. but you do seem to be repeating the  
phrase "incremental protection" recently which i think i take issue  
with.  If you really think about it, everything in life is pretty  
much incremental (even if the size of the increments might vary  
widely) - checksums and scrubbing are only a piece of the larger data  
protection schemes.  This should really be used along with snapshots,  
replication, and backup - but i thought that was a given considering  
what's already built into the filesystem and the wealth of other  
tools we try to share in Solaris.

<snip> <snip> <snip>
too many problems to address .. too little time
---
.je

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Yager on ZFS

Reply via email to