Jeff,

Yes, that's correct.

In v4 a full backup using --checksum will compare all the metadata and
full-file checksum.  Any file that matches all those will be presumed
unchanged.  In v4 the server load for a full is very low, since all that
meta data (including the full-file checksum) is stored and easily accessed
without needing to look at the file contents at all.  An incremental backup
just checks all the metadata and not the full-file checksum, which is fast
on both the server and client side.  V4 also supports incremental-only (by
periodically filling a backup), in cases where that is sufficient.
However, that's more risky and not the default.

In v3, a full backup checks the block-based deltas and full-file checksum
for every file.  That's a lot more work and seems unnecessary.  You can get
that behavior in v4 too by replacing --checksums with --ignore-times, but
it's a lot more expensive on the server side since v4 doesn't cache the
block and full-file checksums.

While md5 collisions can be constructed with various properties, the chance
of a random file change creating a hash collision is 2^-128, as you note.

Craig


On Sun, Jun 7, 2020 at 9:11 PM <backu...@kosowsky.org> wrote:

> Silly me... the '--checksum' is only for 'Full' so that explains the
> difference between 'incrementals' and 'fulls'... along with presumably
> why my case wasn't caught by an incremental.
>
> I still don't fully understand the comment referencing V3 and replacing
> --checksum with --ignore-times.
>
> Is the point that v3 compared both full file and block
> checksums while in v4 --checksum only compares full file checksums?
> And so v3 is more conservative since there might be checksum
> collisions of 2 non-identical files at the file-checksum level that
> would be unmasked by checksum differences at the block level?
> (presumably a very rare event -- presumably < 2^128 since the hash
> itself is 128 bits and the times and size are also checked)
>
> "" wrote at about 23:54:14 -0400 on Sunday, June 7, 2020:
>  > Can someone clarify how --checksum works in v4?
>  > And specifically, when could it get 'fooled' thinking 2 files are
>  > identical when they really aren't...
>  >
>  > According to config.pl:
>  >
>  >    The --checksum argument causes the client to send full-file
>  >    checksum for every file (meaning the client reads every file and
>  >    computes the checksum, which is sent with the file list).  On the
>  >    server, rsync_bpc will skip any files that have a matching
>  >    full-file checksum, and size, mtime and number of hardlinks.  Any
>  >    file that has different attributes will be updating using the block
>  >    rsync algorithm.
>  >
>  >    In V3, full backups applied the block rsync algorithm to every
>  >    file, which is a lot slower but a bit more conservative.  To get
>  >    that behavior, replace --checksum with --ignore-times.
>  >
>  >
>  > While according to the 'rsync' man pages:
>  >    -c, --checksum
>  >    This changes the way rsync checks if the files have been changed
>  >    and are in need of a transfer.  Without this option, rsync uses a
>  >    "quick check" that (by default) checks if each file’s size and time
>  >    of last modification match between the sender and receiver.  This
>  >    option changes this to compare a 128-bit checksum for each file
>  >    that has a matching size.  Generating the checksums means that both
>  >    sides will expend a lot of disk I/O reading all the data in the
>  >    files in the transfer (and this is prior to any reading that will
>  >    be done to transfer changed files), so this can slow things down
>  >    significantly.
>  >
>  >
>  > Note by default:
>  > $Conf{RsyncFullArgsExtra} = ['--checksum'];
>  >
>  > So in v4:
>  > - Do incrementals and fulls differ in how/when checksums are used?
>  > - For each case, what situations would cause BackupPC to be fooled?
>  > - Specifically, I don't understand the comment of replacing --checksum
>  >   with --ignore-times since the rsync definition of --checksum
>  >   says that it deosn't look at times but a 128-bit file checksum.
>  >
>  > The reason I ask is that I recompiled a debian package (happens to be
>  > libbackuppc-xs-perl) to pull in the latest version 0.60. But I forgot
>  > to change the date in the Changelog. When installing the package, the
>  > file dates were the same even though the content and file md5sums for
>  > some files had changed.
>  >
>  > Specifically,
>  > /usr/lib/x86_64-linux-gnu/perl5/5.26/auto/BackupPC/XS/XS.so
>  > had the same size (and date due to my mistake) but a different file
>  > md5sum.
>  >
>  > And an incremental backup didn't detect this difference...
>  >
>  >
>  > _______________________________________________
>  > BackupPC-users mailing list
>  > BackupPC-users@lists.sourceforge.net
>  > List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
>  > Wiki:    http://backuppc.wiki.sourceforge.net
>  > Project: http://backuppc.sourceforge.net/
>
>
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to