That was very good articles. Thank you for enlightening me.

On Thu, Sep 19, 2013 at 01:49:05PM -0500, Matthew Weigel wrote:
> On 09/19/2013 08:46 AM, hru...@gmail.com wrote:
> 
> >From time to time I think I should follow Kenneth Westerbacks
> >recomendation
> >and go to a  math-for-idiots list, for example to Usenet Group
> >"sci.math",
> >and then make a link to this thread in gmane: they will sure
> >admire Marc
> >Espies wisdom and his efforts teaching idiots like me.
> 
> That seems like a useful exercise for you to do.  Like Marc said
> very early on, rsync is based in part on Andrew Tridgell's PhD
> Thesis, "Efficient Algorithms for Sorting and Synchronization."  You
> can find it and read it at
> http://www.samba.org/~tridge/phd_thesis.pdf.
> 
> A little more searching might also lead you to
> http://www.big.info/2013/04/md5-hash-collision-probability-using.html
> which tries to answer your exact question.  It also points at
> http://en.wikipedia.org/wiki/Birthday_attack where you'll see pretty
> much your exact questions answered.  The probability of a collision
> of MD5, a 128-bit hash (used by modern rsync rather than MD4;
> ignoring the 16-bit rolling signature), for 2 4TB files is about
> 10^(-12).
> 
> That's approximately on par with the likelihood of the hard drive
> reading a bit wrong after you're done using rsync (per Christian
> Weisberger). However, that's ignoring the rolling signature.  In
> fact, you need to have both the rolling signature (16 bits) *and*
> the MD5 hash match at the same time.  The probability of both
> combined is right about 10^(-15) of a hard drive read error.
> 
> That is all of the math.  The references and documents are right
> there.  If you are still worried about it, you are trolling either
> misc@ or yourself or both.
> -- 
> Matthew Weigel
> hacker
> unique & idempot . ent

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Reply via email to