John Pettitt wrote: > Evren Yurtesen wrote: >> BackupPC Manual mentions: >> >> ---------------------------------------------------- >> Each file is examined by generating block checksums (default 2K >> blocks) on the receiving side (that's the BackupPC side), sending >> those checksums to the client, where the remote rsync matches those >> checksums with the corresponding file. The matching blocks and new >> data is sent back, allowing the client file to be reassembled. A >> checksum for the entire file is sent to as an extra check the the >> reconstructed file is correct. >> >> This results in significant disk IO and computation for BackupPC: >> every file in a full backup, or any file with non-matching attributes >> in an incremental backup, needs to be uncompressed, block checksums >> computed and sent. Then the receiving side reassembles the file and >> has to verify the whole-file checksum. Even if the file is identical, >> prior to 2.1.0, BackupPC had to read and uncompress the file twice, >> once to compute the block checksums and later to verify the whole-file >> checksum. >> ---------------------------------------------------- >> >> Why is it actually necessary to do this checksum checking? >> > If you turn on checksum caching (see the manual) it doesn't read every > file every time on the server (just a random sample to ensure that > nothing nasty has happened to the pool). It also doesn't read every > file client side for incremental just for full backups.
Yes, I read it but it is a little bit confusing, it says: ---------------------------------------------------------------------------------------------------------------------------------- BackupPC had to read and uncompress the file twice, once to compute the block checksums and later to verify the whole-file checksum. Starting in 2.1.0, BackupPC supports optional checksum caching, which means the block and file checksums only need to be computed once for each file. This results in a significant performance improvement. This only works for compressed pool files. It is enabled by adding ---------------------------------------------------------------------------------------------------------------------------------- First it says that it had to uncompress the file twice, then it says that with caching it have to do it once. Here I thought 'once at each backup session' instead of twice, not once and never again. So it only checks them rarely when checksum caching is enabled. Which is a good thing :) >> Wouldnt it be enough to find files with non-matching attributes and >> back them up? >> > That's what it does for incremental backups - it says so in the text you > quoted. >> I think that in most cases if at least modification time is different >> then the file >> should be backed up anyway, no? at least there can be situations where >> the lst modification time of a file is more important than it's >> contents even (I dont see how but it is a possibility) >> >> > The checksum in rsync is more about reducing data on the wire than it is > about deciding what gets copied. If the attributes have changed it will > get backed up but only the data that has changed will actually get sent > across the wire. Well as long as it wont be calculated at each backup session, I have nothing against it. :) Perhaps it could be a feature if it checksum checks could be disabled altogether for situations where the bandwidth is cheap but cpu time is expensive? Thanks, Evren ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/backuppc-users http://backuppc.sourceforge.net/