Hi! Thanks for the answer

> > ... rsync --checksum only checksums files on the client, not the
> > server.  I find this strange because not only the manual says
> > otherwise ...
> It is not clear to me what document ("manual") you are reading which
> leads you to the conclusions which you seem to have drawn.  If you can
> give links to the document(s), and quote(s), that might assist.

I was reading the same passages you mentioned, but interpreting them in a
different way.

> [quote]
> *   Uses full-file MD5 digests, which are stored in the directory attrib
>      files. Each backup directory only contains an empty attrib file whose
>      name includes its own MD5 digest, which is used to look up the attrib
>      file's contents in the pool. In turn, that file contains the metadata
>      for every file in that directory, including each files's MD5 digest.
> [/quote]
> I take this to mean that, in order to find the checksums for the files
> on the client, the server looks in the files in its data directory for
> that client precisely because, when it does so, it does NOT then need
> to read pool files (to re-calculate the checksums) because it has done
> that work already and saved the results in the filesystem.

Agreed, this could be an interpretation. However, a bit below it says:

*  rsync-bpc doesn't support checksum caching

Which I interpreted as 'It uses the MD5 digest names only for file
reference, but it doesn't rely on them for file integrity. Therefore, it
will checksum the files again'. After that, my mind was set: I knew
BackupPC already had the checksums, but I thought there were not used by

Your email prompt me to check re-frame that, and sure enough there is this
comment on https://github.com/backuppc/rsync-bpc/blob/master/checksum.c:

* Try to grab the digest from the attributes, which are both MD5 for
protocol >= 30.
* Otherwise fall through and do it the slow way.

so this solves the question? In V4, rsync-bpc uses the attributes' MD5 as a
cache for the full checksum (which is used by --checksum), but it doesn't
have caching capabilities for the block checksums (used by --ignore-times)?

> using this approach, you rely on the integrity of the previously saved
> pool data.

Agreed, that is my situation. I'm reasonably sure of the system(UPS, Debian
stable, ext4), but as my backups are relatively small (1<TB) I can trade
some extra hours of backup once in a while for the extra peace of mind.

> This seems to me further to confirm my interpretation of the earlier
> quote, and also to suggest the behaviour which you yourself describe
> in your posts.  It explicitly refers to "a more conservative approach"
> which may be what you want.

Yes. However, as the same documentation says:

* The use of rsync --checksum allows BackupPC to guess a potential match
anywhere in the pool, even on a first-time backup. In that case, the usual
rsync block checksums are still exchanged to make sure the complete file is

I thought it would be better to use --checksum. But if --checksum doesn't
actually checksums the files on the server each time, I agree that using
--ignore-times is a better fit for my use case at this point. Thanks.

Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
BackupPC-users mailing list
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to