Re: TrieRange

Yonik Seeley Sat, 07 Feb 2009 05:46:27 -0800

On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <[email protected]> wrote:
> An optimization might be to remove
> the lower 0 bits from the string, but it would not be needed. The strings
> are unique for one precision (no difference between 0-bits there or not).


Yes, one would certainly want to remove trailing bits that were insignificant.

To optimize index space, one would want to "right justify" the encoded
number for any bit range to minimize variation on the left - this
plays into lucene's prefix compression.

For exampe: If we wanted to encode 7 bits per character (so each
character will take up only one byte in UTF8), but say we have 9 bits
of data we want to encode.

The two characters could be encoded like this (where x is a data bit):
xxxxxxxx xx000000
Or this:
000000xx xxxxxxxx

The latter is more efficient in index space since many more values
will share the same leading bits.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: TrieRange

Reply via email to