Matija Grabnar wrote: > I re-iterate: regardless of which digest algorithm is chosen, the code > MUST be able to > detect and correctly handle collisions. Collisions WILL occur, > regardless of the algorithm > chosen. It is a mathematically provable fact.
For those of you who have been following this discussion: I've done this thing. - we now use the cryptographic hash only to quickly locate possibly duplicate mime-parts, If the hash doesn't occur yet, a new mimepart is stored using the hash, but generating an auto-increment bigint as it's primary key. If the hash does occur, the insertion code compares the blobs to make sure no hash collision occurs on different blobs. - I've added support for a whole dumpload of hashes: we now support md5, sha1, sha256, sha512, tiger and whirlpool. Since I'm relying on mhash for this, it would be trivial to add other hashes like ghost, but I'm currently restricting things to the ones documenten on the nessie (EU) pages. Looking back, adding all these was probably not really necessary for single-instance storage, but libmhash is rock-solid and widely available, and I have a hunch they might come in handy along the road. -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
