Matija Grabnar wrote:
> If you examine the mathematical theory, no matter how good the checksum
> algorithm, if your checksum
> number is smaller than the files you are calculating it over (and it
> usually is), then you will have a large number (approximately size of
> max object / size of the checksum) objects which will result in the same
> checksum.

I understand.

> Please, please, if you discover that two mime parts have the same
> checksum, and the same size,
> please check that they are really equal (fetch the saved one from the
> storage and compare for equality). Such comparisons won't happen often
> (checksum collisions are designed
> to be rare), so it should not cost a lot of computing time. But if you
> don't do it, sooner or later
> you will lose a mime part that was truly different from what was already
> stored, and your users
> (and my users!) will be furious.

Checksum collisions must happen because inserting the same message twice
will result in collisions on the header and body part. That is by
design. I need to establish a reliable unique primary key that will
identify a mime part.

What we don't want is collisions to happen on mime-parts that are
different. Using sha1 is debatable, I understand. It looks like using
tiger will greatly reduce the risk of unintentional collision.

Of course we could do a double digest - a sha1 plus a tiger, and combine
those into a single key. What would be the chances of accidental
collisions then?


-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to