Paul J Stevens wrote:
> Being 'pretty safe' was deemed unacceptable at the time. Odds for 
> running into collisions were significantly less than astronomical when 
> taking into account the http://en.wikipedia.org/wiki/Birthday_attack.

I totally agree "pretty safe" is unacceptable for something like this. The
last thing you'd want is any potential for someone's mail to show up in a
different inbox.

Do you think combining more than one type of hash would help with this? IE,
so combine a calculated MD5, and a SHA-1, thus in theory reducing the
collision table? (Under the assumption that a collision of both hashes
occurs with different values). Of course, testing for cases would be
extremely difficult to do, and double hashing is again slower.
 

tabris wrote:
> 
> Could you not make it 2-stage, and then configurable as to whether to do
> split query or not?
> 
> Basically have both queries. Only if the first succeeds do you check the
> second.
> Kinda like this:
> 
> if("SELECT 1 FROM dbmail_mimeparts WHERE hash=? AND size=? LIMIT 1") {
>       if("SELECT id FROM dbmail_mimeparts WHERE hash=? AND size=? AND blob=?")
> {
>       }
> }
> 
> It would admittedly make the case where there is a legitimate collision
> (legitimate as in, this _really_ is the same blob) slower. I don't know if
> this is a particular problem or not. The case of the second query failing
> (but the first succeeded) should be a degenerate case.
> 

Actually, I think this is a great approach - I'd probably even build it in
without an option as for an extremely high number of cases, the second query
wouldn't need to run and it's win win. Nice thinking.
-- 
View this message in context: 
http://old.nabble.com/blob_exists---selects-based-on-blob-as-well-as-hash--tp31470216p31474635.html
Sent from the dbmail dev mailing list archive at Nabble.com.

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to