On 09/19/2013 08:46 AM, hru...@gmail.com wrote:

From time to time I think I should follow Kenneth Westerbacks recomendation and go to a math-for-idiots list, for example to Usenet Group "sci.math", and then make a link to this thread in gmane: they will sure admire Marc
Espies wisdom and his efforts teaching idiots like me.

That seems like a useful exercise for you to do. Like Marc said very early on, rsync is based in part on Andrew Tridgell's PhD Thesis, "Efficient Algorithms for Sorting and Synchronization." You can find it and read it at http://www.samba.org/~tridge/phd_thesis.pdf.

A little more searching might also lead you to http://www.big.info/2013/04/md5-hash-collision-probability-using.html which tries to answer your exact question. It also points at http://en.wikipedia.org/wiki/Birthday_attack where you'll see pretty much your exact questions answered. The probability of a collision of MD5, a 128-bit hash (used by modern rsync rather than MD4; ignoring the 16-bit rolling signature), for 2 4TB files is about 10^(-12).

That's approximately on par with the likelihood of the hard drive reading a bit wrong after you're done using rsync (per Christian Weisberger). However, that's ignoring the rolling signature. In fact, you need to have both the rolling signature (16 bits) *and* the MD5 hash match at the same time. The probability of both combined is right about 10^(-15) of a hard drive read error.

That is all of the math. The references and documents are right there. If you are still worried about it, you are trolling either misc@ or yourself or both.
--
Matthew Weigel
hacker
unique & idempot . ent

Reply via email to