On Sat, Sep 14, 2013 at 04:13:41PM +0000, hru...@gmail.com wrote:
> Marc Espie <es...@nerim.net> wrote:
> 
> > On Sat, Sep 14, 2013 at 03:09:48PM +0000, hru...@gmail.com wrote:
> >
> > > A completely other thing is to conclude that two *arbitrary* pieces of
> > > data are the same only because they have the same hash. Arbitrary 
> > > means here that the one was not a copy of the other. And this is what
> > > rsync seems to do as far as I understand the wikipedia web-page.
> >
> > The probability of an electrical failure in your hard drive causing
> > it to munge the file, or of a bug in the software using that file
> > is much higher than this happening.
> 
> This is a conjecture. Do you have a proof that the probability is so
> small? For me it is difficult to accept it. Is this conjecture used
> elsewhere?

A resembling application is the Git version control system that is
based on the assumption that all content blobs can be uniquely
decribed by their 128-bit SHA1 hash value. If two blobs have
the same hash value they are assumed to be identical.

If SHA1 is a perfect cryptographic hash value the probability
for mistake is as has been said before 2^-128 which translates
to (according to the old MB vs MiB rule of 10 bit corresponding
to 3 decimal digits) around 10^-37.

According to a previous post in this thread the probability for
disk bit error for a 4 TB hard drive is around 10^-15 so the
SHA1 hash value wint with a factor 10^22 which is a big margin.
So it can be 10^22 times worse than perfect and still beat
the hard drive error probability.

Now you can read what you can find about cryptographic
hash algorighms to convince yourself that the algorithms
used by rsync and/or Git are good enough. I just gave up
and followed the stream when I came to the figure 10^22
times better than hard disk errors...

The assumption of cryptographic hash functions being, according
to their definition; reliable, is heavily used today. At least
by rsync and Git. And there must be a lot of intelligent and
hopefully skilled  people backing that up.

> 
> About my original intention: to get a copy of the repository. Does the 
> repository only grow with new files? Old files never change? Can I 
> hence expect that cvsync never rely on the above questionable conjecture?
> Even if the transmition for whatever reason is interrupted and I try
> again?
> 
> Is there an alternative for downloading the repository without the
> conjecture?

If you download using any such means you better make a content verification
afterwards using a good method e.g SHA256 checksumming. Just a tip.

> 
> I dont like rsync and similars!!!!
> 
> Thanks
> Rodrigo.

-- 

/ Raimo Niskanen, Erlang/OTP, Ericsson AB

Reply via email to