> > > Of course, you're right :) (although pigz failed only in 2 files out of > > several thousands). > > oh well, I was wondering about that. I've yet to see such a file (and > probably never will, because I disabled pool compression for good and > now use btrfs' lzop filesystem-based compression), but... >
I've found a third, all of them 6GB+ ISO. I'm starting to see a pattern :P > BackupPC_zcat decompresses both files correctly and their checksums are > > correct now. However, at least with one of the files there is something > > fishy going on because the compressed version is 60KB, the decompressed > > is 7GB! > > I'd bet that those two files are extremely sparse. > There are good reasons for such a file to be generated: e.g., from a > ddrescue run that skipped lots of bad areas on a drive, or a VM disk > image with a recently formatted partition, or similar. On many modern > file systems supporting sparse files, the overhead for the holes in the > file is negligible, so it's easier from a user perspective to allocate > the "full" file and rely on the filesystem's abilities to optimize > storage and access. > However, some of BackupPC's transfer methods (in particular, rsync) > cannot treat sparse files natively, but since they compress so well, > that's hardly an issue for transfer nor storage on the server. > Thanks for the nice explanation. Unfortunately in this case was a rather more mundane reason, like me failing to properly read the number of digits of a big number... > The reason why I recommended pigz (unfortunately without an appropriate > disclaimer) is that it > - never failed on me, for the files I had around at that time, and > - it was *magnitudes* faster than BackupPC_zcat. > > But I had a severely CPU-limited machine; YMMV with a more powerful CPU. > Depending on your use case (and performance experience), it might still > be clever to run pigz first and only run BackupPC_zcat if there is a > mismatch. If a pigz-decompressed file matches the expected hash, I'd bet > at approximately 1 : 2^64 that no corruption happened. > I'm very severely CPU-limited (Banana Pi), so this can make a huge difference. I tested it by checking two top level cpool dirs (roughly 1/64 ~ 1.5% of the pool). I compared pigz, zlib-flate and BackupPC_zcat and on my system: - both pigz and zlib-flate are much faster than BackupPC_zcat, they take around a quarter of the time to check the files - pigz is marginally faster than zlib-flate - BackupPC_zcat puts the lower load on the CPU, zlib-flate's load is 30-35% higher, and pigz's is a whooping 80-100% higher (pigz's load is actually higher than 2 in this 2-core system) - of course, BackupPC_zcat has the advantage of always working, zlib-flate and pigz fail at the same files (very few) With this data, I modified my script to normally run zlib-flate to check the files, and re-check every failure with BackupPC_zcat before calling it a real error. I think this gets the best balance between load on the system and time spent checking the pool (I can traverse the entire pool in 32 days with ~30 min of checking every day). > I'll check those 2 files tonight, and hopefully > > have a script working by the weekend. > > Cool! If you don't mind and are allowed to, please share here... > The check script is almost there, I want to verify it for a couple of days more before sharing it. The find script seems a bit harder to code that what I first thought :) Cheers, Guillermo
_______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/