John Abreau <abre...@gmail.com> writes:

> I've heard of tools using MD5 or SHA1 hashes to identify duplicates, and
> potential issues with hash collisions causing false positives.

By accident or maliciously? The numbers seem off for accidental
collisions. An md5 sum is a 16 digit hex number. That gives
340282366920938463463374607431768211456 potential hash sums (or does the
algorithm offer only a smaller subset?). I'm not going to bother to
compute the probability of a collision. It's a very remote possiblity,
yes? For the malicious case, if someone's able to mess with the hashes
used by deduplication code in your file system or in your hopefully
almost as good userland equivalent (which of course must use git in some
way or another for reasons that are not clear to me) you have unsolvable
problems.

I once saw a pointer to a thread speculating about the problem for git
(when actually used for source code, go figure), but the hypothetical
attack needed a hostile committer. It wasn't an easy attack even then,
though that might have been in part from the social engineering
challenges of having other committers not notice what you've
done.

-- 
Mike Small
sma...@sdf.org
_______________________________________________
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss

Reply via email to