>
> > Of course, you're right :) (although pigz failed only in 2 files out of
> > several thousands).
>
> oh well, I was wondering about that. I've yet to see such a file (and
> probably never will, because I disabled pool compression for good and
> now use btrfs' lzop filesystem-based compression), but...
>

I've found a third, all of them 6GB+ ISO. I'm starting to see a pattern :P

> BackupPC_zcat decompresses both files correctly and their checksums are
> > correct now. However, at least with one of the files there is something
> > fishy going on because the compressed version is 60KB, the decompressed
> > is 7GB!
>
> I'd bet that those two files are extremely sparse.
> There are good reasons for such a file to be generated: e.g., from a
> ddrescue run that skipped lots of bad areas on a drive, or a VM disk
> image with a recently formatted partition, or similar. On many modern
> file systems supporting sparse files, the overhead for the holes in the
> file is negligible, so it's easier from a user perspective to allocate
> the "full" file and rely on the filesystem's abilities to optimize
> storage and access.
> However, some of BackupPC's transfer methods (in particular, rsync)
> cannot treat sparse files natively, but since they compress so well,
> that's hardly an issue for transfer nor storage on the server.
>

Thanks for the nice explanation. Unfortunately in this case was a rather
more mundane reason, like me failing to properly read the number of digits
of a big number...


> The reason why I recommended pigz (unfortunately without an appropriate
> disclaimer) is that it
> - never failed on me, for the files I had around at that time, and
> - it was *magnitudes* faster than BackupPC_zcat.
>
> But I had a severely CPU-limited machine; YMMV with a more powerful CPU.
> Depending on your use case (and performance experience), it might still
> be clever to run pigz first and only run BackupPC_zcat if there is a
> mismatch. If a pigz-decompressed file matches the expected hash, I'd bet
> at approximately 1 : 2^64 that no corruption happened.
>

I'm very severely CPU-limited (Banana Pi), so this can make a huge
difference. I tested it by checking two top level cpool dirs (roughly 1/64
~ 1.5% of the pool). I compared pigz, zlib-flate and BackupPC_zcat and on
my system:
- both pigz and zlib-flate are much faster than BackupPC_zcat, they take
around a quarter of the time to check the files
- pigz is marginally faster than zlib-flate
- BackupPC_zcat puts the lower load on the CPU, zlib-flate's load is 30-35%
higher, and pigz's is a whooping 80-100% higher (pigz's load is actually
higher than 2 in this 2-core system)
- of course, BackupPC_zcat has the advantage of always working, zlib-flate
and pigz fail at the same files (very few)

With this data, I modified my script to normally run zlib-flate to check
the files, and re-check every failure with BackupPC_zcat before calling it
a real error. I think this gets the best balance between load on the system
and time spent checking the pool (I can traverse the entire pool in 32 days
with ~30 min of checking every day).

> I'll check those 2 files tonight, and hopefully
> > have a script working by the weekend.
>
> Cool! If you don't mind and are allowed to, please share here...
>

The check script is almost there, I want to verify it for a couple of days
more before sharing it. The find script seems a bit harder to code that
what I first thought :)

Cheers,
Guillermo
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to