> One concern that I've always had with compressing backup streams is
> what happen if there is an unrecoverable block
> of data in somewhere in the middle of the stream?   Do you lose every
> bit of data after that bad block or is there some ability to
> resynchronize the stream so you don't lose everything?   

What do you use?  No compression?

The datastream that I want to write to tape is a "zfs send" stream.  These
are extremely sensitive to data corruption.  If a single bit is toggled
anywhere in the zfs data stream, then the final checksum will fail, and the
whole stream is lost.  So, if you know of recoverable compression tools,
it's not likely to benefit me, but for curiosity it would be nice to know...

Since my dataset is an all-or-nothing dataset, if I compress to 50% of the
original size, it means I am 50% less likely to have any corruption, and if
I can do that "realtime" it means my restore could possibly run twice as
fast too.  :-D

Also, I suppose, it's not impossible to layer some error
detection/correction on top of a datastream.  I haven't looked for any such
thing, there must be some free libraries out there, right?  


> Does your
> software/algorithm do anything to deal with this or do you just get
> junk after the bad block?  

I don't know what others do.  But threadzip takes as a parameter, a
blocksize.  Default is 5M.  So it reads input in 5M chunks, and then uses
zlib (same or similar to gzip) to compress those chunks in parallel.  The
resultant compressed chunk blocksize is of course variable, but suppose
there's a 2.5M compressed block that got corrupted:  Then there is a 5M
block of corrupted data (any number of bits could be corrupted within the 5M
chunk.)  During decompression, I don't know the behavior of zlib; I suppose
it probably throws an exception and crashes.  But I think it would be
trivial to add a switch to ignore errors, and allow erroneous decompression
to take place.  It's not something I would normally do.  I usually consider
corrupted data throw-away data.


_______________________________________________
bblisa mailing list
[email protected]
http://www.bblisa.org/mailman/listinfo/bblisa

Reply via email to