On Thu, 12 Feb 2015 17:21:50 +0100, Warren Young <w...@etr-usa.com> wrote:
On Feb 11, 2015, at 7:23 AM, Richard Hipp <d...@sqlite.org> wrote:
On 2/11/15, j. van den hoff <veedeeh...@googlemail.com> wrote:
whatever the reason, the netbsd example (a worst case scenario, really)
would suggest to chose 12 instead of 10 as the future default length
to avoid collisions these next some hundred years.
Maybe the default prefix lengths should auto-adjust depending on the
number of artifacts in the repository?
That could work.
If you rearrange the square approximation formula from the Wikipedia
article to solve for m, then take the base-2 log of that to get bits,
then divide by 4 to get bits per nibble (i.e. hex digits), you get:
d = log2(n^2 / 2p) / 4
That is to say, it gives you the number of digits d required to achieve
a given chance of collision p in a hash set size n.
So for my 97,000 hash repo, we need 13 digits to approach p=1e-6.
I suggest making the p value configurable, with a reasonable default.
1e-6 sounds good to me.
this would probably have to be done via `fossil set'. if it is made to be
user-configurable at all (which
I would welcome) it sounds simpler to me to just provide the
user with the ability to change the number of digits, i.e. something like
a new option
`fossil set hash-prefix-length'
so one can stay with the default of 10 until a real need for changing it
occurs.
what I mean: it makes more sense to me to just wait (using the 10 digits)
until the first collision
causes a hickup (which might be much later than creation of the collision
in the repo:
netbsd currently has 4 10 digits collisions -- but how often have they
been triggered? probably never, since
you would have to actually to pick one of those 8 artifacts from the 2e6
artifacts for the intended action (say a `cat')).
if the user notices a collision, he can _then_ increment his setting by 1
or 2 digits or whatever he sees fit.
making the probability threshold user settable instead seems less
convenient since it is just
a statistical measure and it is counterintuitive in the sense that it
tells you something about
chance of occurence rather than chance of hitting the collision.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
--
Using Opera's revolutionary email client: http://www.opera.com/mail/
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users