On Tue, Dec 8, 2009 at 8:08 PM, Edward Ned Harvey <[email protected]> wrote: >> One concern that I've always had with compressing backup streams is >> what happen if there is an unrecoverable block >> of data in somewhere in the middle of the stream? Do you lose every >> bit of data after that bad block or is there some ability to >> resynchronize the stream so you don't lose everything? > > What do you use? No compression?
Typically. >.... > detection/correction on top of a datastream. I haven't looked for any such > thing, there must be some free libraries out there, right? I've always thought that forward error correction might be relevant, but I haven't really looked into it. >> Does your >> software/algorithm do anything to deal with this or do you just get >> junk after the bad block? > > I don't know what others do. But threadzip takes as a parameter, a > blocksize. Default is 5M. So it reads input in 5M chunks, and then uses > zlib (same or similar to gzip) to compress those chunks in parallel. The > resultant compressed chunk blocksize is of course variable, but suppose > there's a 2.5M compressed block that got corrupted: Then there is a 5M > block of corrupted data (any number of bits could be corrupted within the 5M > chunk.) During decompression, I don't know the behavior of zlib; I suppose > it probably throws an exception and crashes. But I think it would be > trivial to add a switch to ignore errors, and allow erroneous decompression > to take place. It's not something I would normally do. I usually consider > corrupted data throw-away data. Here is a link to some software that works with gzip/cpio which tries to deal with this: http://www.urbanophile.com/arenn/coding/gzrt/ And here is something about generic gzip and tar: http://www.gzip.org/recover.txt With something like tar/cpio or any other program that does file backups, it would seem to me that the compression should really be done on a per file basis and some kind of small header should be placed at the beginning of each compressed file. This would make it much easier to handle recovery. Depending on the underlying block size of the storage media (and file sizes), you might end up losing just one file per bad block. Bill Bogstad _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
