Jake Anderson wrote: > I think if you comprise an index based on hash and file size it might go > a ways to speeding things up with minimal cost. You only need to worry > about a collision if the file/chunk size and the hash are the same.
Sorry Jake, but that kite won't fly (dutch slang) Adding the size to the index will only increase the size of the fingerprint. That would essentially be the same as doing composite hashes (using the size as a poor mans checksum). Consider that 9 out of 10 times when you retrieve chunks based on their hash value, they will indeed be the same chunk. But in each of those cases we *must* do a memcmp to make bloody sure they really are the same. I don't see any way around that. -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ DBmail mailing list [email protected] https://mailman.fastxs.nl/mailman/listinfo/dbmail
