On Thu, 2007-10-11 at 12:28 +0100, Simon Peyton-Jones wrote: > Interesting! The associativity property is the kind of thing I was after. > Except that I don't really care if FP(as ++ bs) = FP(as) `plusFP` FP(bs). I > only care that the latter is robust in the sense of having low probabilty of > collision. So the Rabin scheme does more than I really need (which is > probably fine). > > On page 1 he draws a distinction between fingerprinting and hashing, which > relates to my last wish. Does one need a different technique for the two, or > is it enough to reduce the number of bits in the Rabin scheme, and thereby > get a hashing scheme? > > Strangely the paper gives no comparison with other fingerprinting schemes. > > Simon > | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lauri Alanko > | > | If compositionality is important, at least Rabin's fingerprints are > | worth considering: http://citeseer.ist.psu.edu/broder93some.html
Note that Rabin's fingerprint algorithm is the same as CRC (http://en.wikipedia.org/wiki/Cyclic_redundancy_check). I'm not aware of people using CRCs for hashing inside a program, but they should work fine and be fairly fast. I can think of two caveats. 1) With CRC and a fixed polynomial, it is very easy for an "adversary" to pick multiple inputs that give the same fingerprint/hash code. (Rabin avoids this by picking a random polynomial after the adversary has selected the inputs.) Depending on the application, this may be very important, or it may not matter at all. 2) The straightforward approach for "instance Fingerprintable Fingerprint" works very poorly for CRCs. Considering balanced binary trees with four leaves, and the obvious compositional hash function, then for all a,b,c,d we would have fingerprint ((a,b),(c,d)) == fingerprint ((a,c),(b,d)), and fingerprint ((a,b),(b,d)) == fingerprint ((a,c),(c,d)). To avoid this, the fingerprint would have to include not only the remainder, but the number of bits involved in creating the fingerprint. Then fingerprinting a fingerprint would require time logarithmic in this number of bits. Carl Witty _______________________________________________ Haskell mailing list Haskell@haskell.org http://www.haskell.org/mailman/listinfo/haskell