apologies in advance for prolonging this thread .. i had considered taking this completely offline, but thought of a few people at least who might find this discussion somewhat interesting .. at the least i haven't seen any mention of Merkle trees yet as the nerd in me yearns for
On Dec 5, 2007, at 19:42, bill todd - aka can you guess? wrote: >> what are you terming as "ZFS' incremental risk reduction"? .. >> (seems like a leading statement toward a particular assumption) > > Primarily its checksumming features, since other open source > solutions support simple disk scrubbing (which given its ability to > catch most deteriorating disk sectors before they become unreadable > probably has a greater effect on reliability than checksums in any > environment where the hardware hasn't been slapped together so > sloppily that connections are flaky). ah .. okay - at first reading "incremental risk reduction" seems to imply an incomplete approach to risk .. putting various creators and marketing organizations pride issues aside for a moment, as a complete risk reduction - nor should it billed as such. However i do believe that an interesting use of the merkle tree with a sha256 hash is somewhat of an improvement over conventional volume based data scrubbing techniques since there can be a unique integration between the hash tree for the filesystem block layout and a hierarchical data validation method. In addition to the finding unknown areas with the scrub, you're also doing relatively inexpensive data validation checks on every read. > Aside from the problems that scrubbing handles (and you need > scrubbing even if you have checksums, because scrubbing is what > helps you *avoid* data loss rather than just discover it after it's > too late to do anything about it), and aside from problems deriving > from sloppy assembly (which tend to become obvious fairly quickly, > though it's certainly possible for some to be more subtle), > checksums primarily catch things like bugs in storage firmware and > otherwise undetected disk read errors (which occur orders of > magnitude less frequently than uncorrectable read errors). sure - we've seen many transport errors, as well as firmware implementation errors .. in fact with many arrays we've seen data corruption issues with the scrub (particularly if the checksum is singly stored along with the data block) - just like spam you really want to eliminate false positives that could indicate corruption where there isn't any. if you take some time to read the on disk format for ZFS you'll see that there's a tradeoff that's done in favor of storing more checksums in many different areas instead of making more room for direct block pointers. > Robert Milkowski cited some sobering evidence that mid-range arrays > may have non-negligible firmware problems that ZFS could often > catch, but a) those are hardly 'consumer' products (to address that > sub-thread, which I think is what applies in Stefano's case) and b) > ZFS's claimed attraction for higher-end (corporate) use is its > ability to *eliminate* the need for such products (hence its > ability to catch their bugs would not apply - though I can > understand why people who needed to use them anyway might like to > have ZFS's integrity checks along for the ride, especially when > using less-than-fully-mature firmware). actually on this list we've seen a number of consumer level products including sata controllers, and raid cards (which are also becoming more commonplace in the consumer realm) that can be confirmed to throw data errors. Code maturity issues aside, there aren't very many array vendors that are open-sourcing their array firmware - and if you consider zfs as a feature-set that could function as a multi- purpose storage array (systems are cheap) - i find it refreshing that everything that's being done under the covers is really out in the open. > And otherwise undetected disk errors occur with negligible > frequency compared with software errors that can silently trash > your data in ZFS cache or in application buffers (especially in PC > environments: enterprise software at least tends to be more stable > and more carefully controlled - not to mention their typical use of > ECC RAM). > > So depending upon ZFS's checksums to protect your data in most PC > environments is sort of like leaving on a vacation and locking and > bolting the back door of your house while leaving the front door > wide open: yes, a burglar is less likely to enter by the back > door, but thinking that the extra bolt there made you much safer is > likely foolish. granted - it's not an all-in-one solution, but by combining the merkle tree approach with the sha256 checksum along with periodic data scrubbing - it's a darn good approach .. particularly since it also tends to cost a lot less than what you might have to pay elsewhere for something you can't really see inside. > Conversely, if you don't care enough about your data to take those > extra steps, then adding ZFS's incremental protection won't reduce > your net risk by a significant percentage (because the other risks > that still remain are so much larger). > > Was my point really that unclear before? It seems as if this must > be at least the third or fourth time that I've explained it. not at all, disasters happen in many ways and forms and one must put in place strategies and protections to deal with as much as you can see - granted you can never cover all your bases and disasters can always find their way through .. but you do seem to be repeating the phrase "incremental protection" recently which i think i take issue with. If you really think about it, everything in life is pretty much incremental (even if the size of the increments might vary widely) - checksums and scrubbing are only a piece of the larger data protection schemes. This should really be used along with snapshots, replication, and backup - but i thought that was a given considering what's already built into the filesystem and the wealth of other tools we try to share in Solaris. <snip> <snip> <snip> too many problems to address .. too little time --- .je _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss