On 04/25/2011 12:47 PM, Chris Boulton wrote:
> General question...
> 
> blob_exists does this - "SELECT id FROM dbmail_mimeparts WHERE hash=? AND
> size=? AND blob=?"
> 
> Any reason why " AND blob=?" is also included, even though we're querying
> based on hash and size? Is it to absolutely, 100% confirm that there's no
> potential collision?

Yep. That was the *only* reason.

My initial design matched only on the hash, later augmented by the size.
Since, however, these do not guarantee non-collision with absolute
certainty above procedure was selected as being most efficient.

> If you're running a database over the network, sending 10M over the wire for
> a single part (most likely twice - once for the select and once for the
> insert) seems to add quite a bit of unnecessary overhead, then you're
> forcing the QBMS to load the entire blob from the disk (forcing it in to
> cache), and compare the entire string
> 
> Given that we don't have any registered SHA-1 collisions (at the standard 80
> rounds anyway), and at SHA-2 (256/512), I think we'd be pretty safe to stop
> sending the blob for comparison as well? I'm *slightly* more concerned about
> MD5 and the other hashing algorithms which I'm not familiar with.

Being 'pretty safe' was deemed unacceptable at the time. Odds for
running into collisions were significantly less than astronomical when
taking into account the http://en.wikipedia.org/wiki/Birthday_attack.

However, since this is basically a policy decision, it would we quite
feasible make this algorithm a runtime configuration - allowing system
administrators to select either the strict policy now in use, or a more
relaxed and faster policy where blob=? is dropped.



-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev

Reply via email to