Hi! 25.03.2014 12:51, Rob Vesse kirjoitti:
Regardless of the hash function used there is always a collision probability. SDB uses MD5 which has a probability of approximately 2^20.96 according to http://en.wikipedia.org/wiki/Comparison_of_cryptographic_hash_functions#Cry ptanalysis so approximately 1 in 2 million
I think this number applies only for deliberate attempts to generate a collision (the table heading on Wikipedia says "Best known attacks"). The probability for coincidental collisions, which I hope is the more relevant case here, should be much lower, probably closer to 2^128 which is the MD5 digest size. Otherwise you would likely get a hash collision with a SDB having only a few million triples.
-Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Teollisuuskatu 23) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 [email protected] http://www.nationallibrary.fi
