On 05/09/13 15:21, David Given wrote: > Richard Hipp wrote: [...] >> HHH -> WORD -> HH >> HHHHHH -> WORD-WORD -> HHHHH >> HHHHHHHH -> WORD-WORD-WORD -> HHHHHHHH [...] > I'll have a go tonight and see what happens --- the encoder/decoder is > wrong and needs to be rewritten anyway.
After having attempted it, I think this is actually going to be really hard, or at any rate beyond my math skills at 2200 at night. The way the encoding works is that it divides the number up into three base 1626 digits, and those digits are indices into the dictionary. This means that each digit represents 10 2/3 bits of the hash. However, truncating the hash by dropping hex digits involves truncating it to a whole number of bits wide. This is equivalent to taking the modulus of a power of 2. Unfortunately because this isn't a multiple of 1626, it's going to perturb all the base 1626 digits, and so change the encoding. A slightly more comprehensible example using bases 9 and 10 demonstrates the problem: 123456789_10 % 10000000_9 = 123456789 % 4782969 = 3882564 If anyone can prove me wrong, please do... I think, without a mathematical proof, that maintaining the ability to take prefixes of an encoded name will require us to use a dictionary that fits into a precise number of bits. Truncating the dictionary to 2^10 entries would be the simplest approach, but this means that our three words no longer encode exactly 32 bits --- we only get 30 bits, which is 7 hex characters. Four words gets us 40 bits, which is 10 hex characters. We don't get anything in between. I don't think seven hex characters is unique enough, based on the stats other people have figured out. Could someone do the collision check on the NetBSD repository, if they have a copy? (It's not worth checking out, as it's kind of huge.) (The alternative is a 2^11 word dictionary, which means coming up with another 422 suitable words from somewhere, which I don't think is feasible. The author claims the current dictionary took 300 hours to compile.) -- ┌─── dg@cowlark.com ───── http://www.cowlark.com ───── │ │ "Ripley's Law: Never go further for the cat than the cat would go for │ you." --- Vexxarr Bleen (trans. Hunter Cressall)
signature.asc
Description: OpenPGP digital signature
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users