On 04/25/2011 12:47 PM, Chris Boulton wrote: > General question... > > blob_exists does this - "SELECT id FROM dbmail_mimeparts WHERE hash=? AND > size=? AND blob=?" > > Any reason why " AND blob=?" is also included, even though we're querying > based on hash and size? Is it to absolutely, 100% confirm that there's no > potential collision?
Yep. That was the *only* reason. My initial design matched only on the hash, later augmented by the size. Since, however, these do not guarantee non-collision with absolute certainty above procedure was selected as being most efficient. > If you're running a database over the network, sending 10M over the wire for > a single part (most likely twice - once for the select and once for the > insert) seems to add quite a bit of unnecessary overhead, then you're > forcing the QBMS to load the entire blob from the disk (forcing it in to > cache), and compare the entire string > > Given that we don't have any registered SHA-1 collisions (at the standard 80 > rounds anyway), and at SHA-2 (256/512), I think we'd be pretty safe to stop > sending the blob for comparison as well? I'm *slightly* more concerned about > MD5 and the other hashing algorithms which I'm not familiar with. Being 'pretty safe' was deemed unacceptable at the time. Odds for running into collisions were significantly less than astronomical when taking into account the http://en.wikipedia.org/wiki/Birthday_attack. However, since this is basically a policy decision, it would we quite feasible make this algorithm a runtime configuration - allowing system administrators to select either the strict policy now in use, or a more relaxed and faster policy where blob=? is dropped. -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail-dev