On Tuesday, April 18, Larry Jones wrote:
>
> Indeed. Since the rsync algorithm uses two different checksums over
> fixed-size blocks (rather than a single algorithm across an entire
> file), the chances of errors are greatly reduced (from microscopic to
> infinitessimal) and I'd say enhancing CVS to use it would be a far more
> valuable change than just using checksums in addition to (or instead of)
> timestamps.
Unless you can point me at a definite article that explains the coding
and complexity theory behind using 2 different algorithms, and that
proves that it actually does reduce the chance of errors, I'm going to
say, that the *BEST* you can do, is as well as a single algorithm with
N+M bits worth of a sum. In other words, I'm sure I can replace the
2 algorithms with 1 having the same "chance of errors" properties.
Also, the "rsync algorithm" uses 2 sums. One of them is MD5, so in
some sense, rsync is at least as good as us using MD5. Also, the other
algorithm is optimized for speed, not "collision" detection. It is
*not* particularly resistant to "chance of error", that is why they
author used MD5 as the "authoratative sum" within his algorithm.
Also, the rsync algorithm does not help much here. It makes a lot of
sense in a "transmition" scenario. In a "checking" of "checksum"
scenario, it makes less sense. In some sense, checksums are meant to
be "fixed size" representations of a file. Quick to look up, quick to
manage, compare, etc.
--Toby.