On 05/09/13 15:21, David Given wrote:
> Richard Hipp wrote:
[...]
>>      HHH ->      WORD      -> HH
>>   HHHHHH ->    WORD-WORD   -> HHHHH
>> HHHHHHHH -> WORD-WORD-WORD -> HHHHHHHH
[...]
> I'll have a go tonight and see what happens --- the encoder/decoder is
> wrong and needs to be rewritten anyway.

After having attempted it, I think this is actually going to be really
hard, or at any rate beyond my math skills at 2200 at night.

The way the encoding works is that it divides the number up into three
base 1626 digits, and those digits are indices into the dictionary. This
means that each digit represents 10 2/3 bits of the hash.

However, truncating the hash by dropping hex digits involves truncating
it to a whole number of bits wide. This is equivalent to taking the
modulus of a power of 2. Unfortunately because this isn't a multiple of
1626, it's going to perturb all the base 1626 digits, and so change the
encoding.

A slightly more comprehensible example using bases 9 and 10 demonstrates
the problem:

123456789_10 % 10000000_9 = 123456789 % 4782969 = 3882564

If anyone can prove me wrong, please do...

I think, without a mathematical proof, that maintaining the ability to
take prefixes of an encoded name will require us to use a dictionary
that fits into a precise number of bits. Truncating the dictionary to
2^10 entries would be the simplest approach, but this means that our
three words no longer encode exactly 32 bits --- we only get 30 bits,
which is 7 hex characters. Four words gets us 40 bits, which is 10 hex
characters. We don't get anything in between.

I don't think seven hex characters is unique enough, based on the stats
other people have figured out. Could someone do the collision check on
the NetBSD repository, if they have a copy? (It's not worth checking
out, as it's kind of huge.)

(The alternative is a 2^11 word dictionary, which means coming up with
another 422 suitable words from somewhere, which I don't think is
feasible. The author claims the current dictionary took 300 hours to
compile.)

-- 
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ "Ripley's Law: Never go further for the cat than the cat would go for
│ you." --- Vexxarr Bleen (trans. Hunter Cressall)

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to