Matija Grabnar wrote:
> I re-iterate: regardless of which digest algorithm is chosen, the code
> MUST be able to
> detect and correctly handle collisions. Collisions WILL occur,
> regardless of the algorithm
> chosen. It is a mathematically provable fact.

For those of you who have been following this discussion: I've done this
thing.

- we now use the cryptographic hash only to quickly locate possibly
duplicate mime-parts, If the hash doesn't occur yet, a new mimepart is
stored using the hash, but generating an auto-increment bigint as it's
primary key. If the hash does occur, the insertion code compares the
blobs to make sure no hash collision occurs on different blobs.

- I've added support for a whole dumpload of hashes: we now support md5,
sha1, sha256, sha512, tiger and whirlpool. Since I'm relying on mhash
for this, it would be trivial to add other hashes like ghost, but I'm
currently restricting things to the ones documenten on the nessie (EU)
pages. Looking back, adding all these was probably not really necessary
for single-instance storage, but libmhash is rock-solid and widely
available, and I have a hunch they might come in handy along the road.


-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to