On 09/19/2013 08:46 AM, hru...@gmail.com wrote:
From time to time I think I should follow Kenneth Westerbacks
recomendation
and go to a math-for-idiots list, for example to Usenet Group
"sci.math",
and then make a link to this thread in gmane: they will sure admire
Marc
Espies wisdom and his efforts teaching idiots like me.
That seems like a useful exercise for you to do. Like Marc said very
early on, rsync is based in part on Andrew Tridgell's PhD Thesis,
"Efficient Algorithms for Sorting and Synchronization." You can find it
and read it at http://www.samba.org/~tridge/phd_thesis.pdf.
A little more searching might also lead you to
http://www.big.info/2013/04/md5-hash-collision-probability-using.html
which tries to answer your exact question. It also points at
http://en.wikipedia.org/wiki/Birthday_attack where you'll see pretty
much your exact questions answered. The probability of a collision of
MD5, a 128-bit hash (used by modern rsync rather than MD4; ignoring the
16-bit rolling signature), for 2 4TB files is about 10^(-12).
That's approximately on par with the likelihood of the hard drive
reading a bit wrong after you're done using rsync (per Christian
Weisberger). However, that's ignoring the rolling signature. In fact,
you need to have both the rolling signature (16 bits) *and* the MD5 hash
match at the same time. The probability of both combined is right about
10^(-15) of a hard drive read error.
That is all of the math. The references and documents are right there.
If you are still worried about it, you are trolling either misc@ or
yourself or both.
--
Matthew Weigel
hacker
unique & idempot . ent