Aleksander Kamenik wrote:
> Paul J Stevens wrote:
>
>> - we now use the cryptographic hash only to quickly locate possibly
>> duplicate mime-parts, If the hash doesn't occur yet, a new mimepart is
>> stored using the hash, but generating an auto-increment bigint as it's
>> primary key. If the hash does occur, the insertion code compares the
>> blobs to make sure no hash collision occurs on different blobs.
>
> Out of curiosity, what if a collision happens, how do you store the this
> block? "hashvalue + 1" and repeat cycle?
Nono, the hash is no-longer the primary key. The primary is now an
auto-increment int.
I now do in pseudocode
get_blob_id(buf):
newhash = get_hash(buf)
for (id, data) in q("select id,data from mimeparts where hash=?", newhash):
if data==buf: return id
return 0
store_blob(buf):
if id=get_blob_id(buf): return id # no need to store this one again
hash=get_hash(buf)
id=q("insert into mimeparts (hash,data) values (?,?) returning id", hash,buf)
return id
--
________________________________________________________________
Paul Stevens paul at nfg.nl
NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31
The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail