That was very good articles. Thank you for enlightening me. On Thu, Sep 19, 2013 at 01:49:05PM -0500, Matthew Weigel wrote: > On 09/19/2013 08:46 AM, hru...@gmail.com wrote: > > >From time to time I think I should follow Kenneth Westerbacks > >recomendation > >and go to a math-for-idiots list, for example to Usenet Group > >"sci.math", > >and then make a link to this thread in gmane: they will sure > >admire Marc > >Espies wisdom and his efforts teaching idiots like me. > > That seems like a useful exercise for you to do. Like Marc said > very early on, rsync is based in part on Andrew Tridgell's PhD > Thesis, "Efficient Algorithms for Sorting and Synchronization." You > can find it and read it at > http://www.samba.org/~tridge/phd_thesis.pdf. > > A little more searching might also lead you to > http://www.big.info/2013/04/md5-hash-collision-probability-using.html > which tries to answer your exact question. It also points at > http://en.wikipedia.org/wiki/Birthday_attack where you'll see pretty > much your exact questions answered. The probability of a collision > of MD5, a 128-bit hash (used by modern rsync rather than MD4; > ignoring the 16-bit rolling signature), for 2 4TB files is about > 10^(-12). > > That's approximately on par with the likelihood of the hard drive > reading a bit wrong after you're done using rsync (per Christian > Weisberger). However, that's ignoring the rolling signature. In > fact, you need to have both the rolling signature (16 bits) *and* > the MD5 hash match at the same time. The probability of both > combined is right about 10^(-15) of a hard drive read error. > > That is all of the math. The references and documents are right > there. If you are still worried about it, you are trolling either > misc@ or yourself or both. > -- > Matthew Weigel > hacker > unique & idempot . ent
-- / Raimo Niskanen, Erlang/OTP, Ericsson AB