On Tue, Dec 18, 2007, Paul J Stevens <[EMAIL PROTECTED]> said:

> Matija Grabnar wrote:
>> If you examine the mathematical theory, no matter how good the checksum
>> algorithm, if your checksum
>> number is smaller than the files you are calculating it over (and it
>> usually is), then you will have a large number (approximately size of
>> max object / size of the checksum) objects which will result in the same
>> checksum.
> 
> I understand.
> 
>> Please, please, if you discover that two mime parts have the same
>> checksum, and the same size,
>> please check that they are really equal (fetch the saved one from the
>> storage and compare for equality). Such comparisons won't happen often
>> (checksum collisions are designed
>> to be rare), so it should not cost a lot of computing time. But if you
>> don't do it, sooner or later
>> you will lose a mime part that was truly different from what was already
>> stored, and your users
>> (and my users!) will be furious.
> 
> Checksum collisions must happen because inserting the same message twice
> will result in collisions on the header and body part. That is by
> design. I need to establish a reliable unique primary key that will
> identify a mime part.
> 
> What we don't want is collisions to happen on mime-parts that are
> different. Using sha1 is debatable, I understand. It looks like using
> tiger will greatly reduce the risk of unintentional collision.
> 
> Of course we could do a double digest - a sha1 plus a tiger, and combine
> those into a single key. What would be the chances of accidental
> collisions then?
> 

I'm a big fan of double digest. I don't think the algorithm matters much,
just as long as the two are very different. Even just MD5 and SHA1 should
be plenty good, IMHO.

Aaron
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to