Paul J Stevens wrote: > Being 'pretty safe' was deemed unacceptable at the time. Odds for > running into collisions were significantly less than astronomical when > taking into account the http://en.wikipedia.org/wiki/Birthday_attack.
I totally agree "pretty safe" is unacceptable for something like this. The last thing you'd want is any potential for someone's mail to show up in a different inbox. Do you think combining more than one type of hash would help with this? IE, so combine a calculated MD5, and a SHA-1, thus in theory reducing the collision table? (Under the assumption that a collision of both hashes occurs with different values). Of course, testing for cases would be extremely difficult to do, and double hashing is again slower. tabris wrote: > > Could you not make it 2-stage, and then configurable as to whether to do > split query or not? > > Basically have both queries. Only if the first succeeds do you check the > second. > Kinda like this: > > if("SELECT 1 FROM dbmail_mimeparts WHERE hash=? AND size=? LIMIT 1") { > if("SELECT id FROM dbmail_mimeparts WHERE hash=? AND size=? AND blob=?") > { > } > } > > It would admittedly make the case where there is a legitimate collision > (legitimate as in, this _really_ is the same blob) slower. I don't know if > this is a particular problem or not. The case of the second query failing > (but the first succeeded) should be a degenerate case. > Actually, I think this is a great approach - I'd probably even build it in without an option as for an extremely high number of cases, the second query wouldn't need to run and it's win win. Nice thinking. -- View this message in context: http://old.nabble.com/blob_exists---selects-based-on-blob-as-well-as-hash--tp31470216p31474635.html Sent from the dbmail dev mailing list archive at Nabble.com. _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev