> One concern that I've always had with compressing backup streams is > what happen if there is an unrecoverable block > of data in somewhere in the middle of the stream? Do you lose every > bit of data after that bad block or is there some ability to > resynchronize the stream so you don't lose everything?
What do you use? No compression? The datastream that I want to write to tape is a "zfs send" stream. These are extremely sensitive to data corruption. If a single bit is toggled anywhere in the zfs data stream, then the final checksum will fail, and the whole stream is lost. So, if you know of recoverable compression tools, it's not likely to benefit me, but for curiosity it would be nice to know... Since my dataset is an all-or-nothing dataset, if I compress to 50% of the original size, it means I am 50% less likely to have any corruption, and if I can do that "realtime" it means my restore could possibly run twice as fast too. :-D Also, I suppose, it's not impossible to layer some error detection/correction on top of a datastream. I haven't looked for any such thing, there must be some free libraries out there, right? > Does your > software/algorithm do anything to deal with this or do you just get > junk after the bad block? I don't know what others do. But threadzip takes as a parameter, a blocksize. Default is 5M. So it reads input in 5M chunks, and then uses zlib (same or similar to gzip) to compress those chunks in parallel. The resultant compressed chunk blocksize is of course variable, but suppose there's a 2.5M compressed block that got corrupted: Then there is a 5M block of corrupted data (any number of bits could be corrupted within the 5M chunk.) During decompression, I don't know the behavior of zlib; I suppose it probably throws an exception and crashes. But I think it would be trivial to add a switch to ignore errors, and allow erroneous decompression to take place. It's not something I would normally do. I usually consider corrupted data throw-away data. _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
