Matija Grabnar wrote: > If you examine the mathematical theory, no matter how good the checksum > algorithm, if your checksum > number is smaller than the files you are calculating it over (and it > usually is), then you will have a large number (approximately size of > max object / size of the checksum) objects which will result in the same > checksum.
I understand. > Please, please, if you discover that two mime parts have the same > checksum, and the same size, > please check that they are really equal (fetch the saved one from the > storage and compare for equality). Such comparisons won't happen often > (checksum collisions are designed > to be rare), so it should not cost a lot of computing time. But if you > don't do it, sooner or later > you will lose a mime part that was truly different from what was already > stored, and your users > (and my users!) will be furious. Checksum collisions must happen because inserting the same message twice will result in collisions on the header and body part. That is by design. I need to establish a reliable unique primary key that will identify a mime part. What we don't want is collisions to happen on mime-parts that are different. Using sha1 is debatable, I understand. It looks like using tiger will greatly reduce the risk of unintentional collision. Of course we could do a double digest - a sha1 plus a tiger, and combine those into a single key. What would be the chances of accidental collisions then? -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
