Dear diary, on Tue, Apr 12, 2005 at 12:40:21AM CEST, I got a letter where Pedro Larroy <[EMAIL PROTECTED]> told me that... > Hi
Hello, > I had a quick look at the source of GIT tonight, I'd like to warn you > about the use of hash functions as content indexers. > > As probably you are aware, hash functions such as SHA-1 are surjective not > bijective (1-to-1 map), so they have collisions. Here one can argue > about the low probability of such a collision, I won't get into > subjetive valorations of what probability of collision is tolerable for > me and what is not. > > I my humble opinion, choosing deliberately, as a design decision, a > method such as this one, that in some cases could corrupt data in a > silent and very hard to detect way, is not very good. One can also argue > that the probability of data getting corrupted in the disk, or whatever > could be higher than that of the collision, again this is not valid > comparison, since the fact that indexing by hash functions without > additional checking could make data corruption legal between the > reasonable working parameters of the program is very dangerous. (i) 1461501637330902918203684832716283019655932542976 possible SHA1 hashes. (ii) In git-pasky, there's (turnable off) detection of collisions. (iii) Your argument against comparing with the probability of a hardware error does not make sense to me. (iv) You fail to propose a better solution. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ 98% of the time I am right. Why worry about the other 3%. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/