Dear diary, on Tue, Apr 12, 2005 at 12:40:21AM CEST, I got a letter
where Pedro Larroy <[EMAIL PROTECTED]> told me that...
> Hi

Hello,

> I had a quick look at the source of GIT tonight, I'd like to warn you
> about the use of hash functions as content indexers.
> 
> As probably you are aware, hash functions such as SHA-1 are surjective not
> bijective (1-to-1 map), so they have collisions. Here one can argue
> about the low probability of such a collision, I won't get into
> subjetive valorations of what probability of collision is tolerable for
> me and what is not. 
> 
> I my humble opinion, choosing deliberately, as a design decision, a
> method such as this one, that in some cases could corrupt data in a
> silent and very hard to detect way, is not very good. One can also argue
> that the probability of data getting corrupted in the disk, or whatever
> could be higher than that of the collision, again this is not valid
> comparison, since the fact that indexing by hash functions without
> additional checking could make data corruption legal between the
> reasonable working parameters of the program is very dangerous.

(i) 1461501637330902918203684832716283019655932542976 possible SHA1 hashes.

(ii) In git-pasky, there's (turnable off) detection of collisions.

(iii) Your argument against comparing with the probability of a hardware
error does not make sense to me.

(iv) You fail to propose a better solution.

-- 
                                Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
98% of the time I am right. Why worry about the other 3%.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to