On Tue, Dec 18, 2007, Paul J Stevens <[EMAIL PROTECTED]> said: > Matija Grabnar wrote: >> If you examine the mathematical theory, no matter how good the checksum >> algorithm, if your checksum >> number is smaller than the files you are calculating it over (and it >> usually is), then you will have a large number (approximately size of >> max object / size of the checksum) objects which will result in the same >> checksum. > > I understand. > >> Please, please, if you discover that two mime parts have the same >> checksum, and the same size, >> please check that they are really equal (fetch the saved one from the >> storage and compare for equality). Such comparisons won't happen often >> (checksum collisions are designed >> to be rare), so it should not cost a lot of computing time. But if you >> don't do it, sooner or later >> you will lose a mime part that was truly different from what was already >> stored, and your users >> (and my users!) will be furious. > > Checksum collisions must happen because inserting the same message twice > will result in collisions on the header and body part. That is by > design. I need to establish a reliable unique primary key that will > identify a mime part. > > What we don't want is collisions to happen on mime-parts that are > different. Using sha1 is debatable, I understand. It looks like using > tiger will greatly reduce the risk of unintentional collision. > > Of course we could do a double digest - a sha1 plus a tiger, and combine > those into a single key. What would be the chances of accidental > collisions then? >
I'm a big fan of double digest. I don't think the algorithm matters much, just as long as the two are very different. Even just MD5 and SHA1 should be plenty good, IMHO. Aaron _______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
