Hi Marc and Thomas, I followed your discussion with great interest. I agree that Thomas very light proposal is good to put in place, since it has almost no negative impact and only benefit. I think there is also a possibility to mitigate the object issue with something close (check integrity of what we get, to at least detect an issue), but that's not perfect of course.
That’s said, I would like to point you to this interesting question on StackOverflow (https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code-collisions) and remind you that base on the Birthday Paradox, with the released of 4.x, we have raised our worrying threshold of documents/objects from 65535, to more than 4 billion… and it took a while (4 versions of XWiki) before we had the strong feeling we need to raise. So, while before 4.x, the worrying threshold was really low, the effective happening of a collision was already low. My own experience was the risk before 4.x was really high with generated names, much hight than with names use by real user. When I was it by that issue, I remember being really bad about it. This is also probably why you have raised this thread. The previous hash was too small and had also a discutable distribution. The MD5 algorithm like many crypto hashes is particularly well suited for providing a good distribution (http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at 64 bits may lower this, but I doubt it would be significant for us. So, personally, I feel really comfortable with the current implementation, and I think you can sleep in peace as well. Just my thought about not raising fears when it’s no more really justified. Regards, -- Denis Gervalle SOFTEC sa - CEO On 7 Feb 2018, 16:10 +0100, Denis Gervalle <denis.gerva...@softec.lu>, wrote: > > Hi Marc and Thomas, > > I followed your discussion with great interest. I agree that Thomas very > light proposal is good to put in place, since it has almost no negative > impact and only benefit. I think there is also a possibility to mitigate the > object issue with something close (check integrity of what we get, to at > least detect an issue), but that's not perfect of course. > > That’s said, I would like to point you to this interesting question on > StackOverflow > (https://stackoverflow.com/questions/22029012/probability-of-64bit-hash-code-collisions) > and remind you that base on the Birthday Paradox, with the released of 4.x, > we have raised our worrying threshold of documents/objects from 65535, to > more than 4 billion… and it took a while (4 versions of XWiki) before we had > the strong feeling we need to raise. So, while before 4.x, the worrying > threshold was really low, the effective happening of a collision was already > low. > > My own experience was the risk before 4.x was really high with generated > names, much hight than with names use by real user. When I was it by that > issue, I remember being really bad about it. This is also probably why you > have raised this thread. The previous hash was too small and had also a > discutable distribution. > > The MD5 algorithm like many crypto hashes is particularly well suited for > providing a good distribution > (http://michiel.buddingh.eu/distribution-of-hash-values), the cutting at 64 > bits may lower this, but I doubt it would be significant for us. So, > personally, I feel really comfortable with the current implementation, and I > think you can sleep in peace as well. > > Just my thought about not raising fears when it’s no more really justified. > Regards,