Aleksander Kamenik wrote:
> Paul J Stevens wrote:
> 
>> - we now use the cryptographic hash only to quickly locate possibly
>> duplicate mime-parts, If the hash doesn't occur yet, a new mimepart is
>> stored using the hash, but generating an auto-increment bigint as it's
>> primary key. If the hash does occur, the insertion code compares the
>> blobs to make sure no hash collision occurs on different blobs.
> 
> Out of curiosity, what if a collision happens, how do you store the this
> block? "hashvalue + 1" and repeat cycle?

Nono, the hash is no-longer the primary key. The primary is now an
auto-increment int.

I now do in pseudocode

get_blob_id(buf):
  newhash = get_hash(buf)
  for (id, data) in q("select id,data from mimeparts where hash=?", newhash):
      if data==buf: return id
  return 0

store_blob(buf):
  if id=get_blob_id(buf): return id # no need to store this one again
  hash=get_hash(buf)
  id=q("insert into mimeparts (hash,data) values (?,?) returning id", hash,buf)
  return id







-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to