On 26/03/14 08:18, Osma Suominen wrote:
Hi!
25.03.2014 12:51, Rob Vesse kirjoitti:
Regardless of the hash function used there is always a collision
probability. SDB uses MD5 which has a probability of approximately
2^20.96 according to
http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cry
ptanalysis so approximately 1 in 2 million
I think this number applies only for deliberate attempts to generate a
collision (the table heading on Wikipedia says "Best known attacks").
The probability for coincidental collisions, which I hope is the more
relevant case here, should be much lower, probably closer to 2^128 which
is the MD5 digest size. Otherwise you would likely get a hash collision
with a SDB having only a few million triples.
Yes, it's coincidental collisions matter.
The 3Store paper puts the probability as approaching 1% at around 1
billion items for that system which is a 64bit hash as well. SDB does
not, in practice, scale to a billion nodes.
(TDB uses a 128 bit hash at the moment)
Andy
-Osma