Jake Anderson wrote:

> I think if you comprise an index based on hash and file size it might go
> a ways to speeding things up with minimal cost. You only need to worry
> about a collision if the file/chunk size and the hash are the same.

Sorry Jake, but that kite won't fly (dutch slang)

Adding the size to the index will only increase the size of the
fingerprint. That would essentially be the same as doing composite
hashes (using the size as a poor mans checksum).

Consider that 9 out of 10 times when you retrieve chunks based on their
hash value, they will indeed be the same chunk. But in each of those
cases we *must* do a memcmp to make bloody sure they really are the
same.  I don't see any way around that.


-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
DBmail mailing list
[email protected]
https://mailman.fastxs.nl/mailman/listinfo/dbmail

Reply via email to