On Sat, Feb 7, 2009 at 6:04 AM, Uwe Schindler <u...@thetaphi.de> wrote: > An optimization might be to remove > the lower 0 bits from the string, but it would not be needed. The strings > are unique for one precision (no difference between 0-bits there or not).
Yes, one would certainly want to remove trailing bits that were insignificant. To optimize index space, one would want to "right justify" the encoded number for any bit range to minimize variation on the left - this plays into lucene's prefix compression. For exampe: If we wanted to encode 7 bits per character (so each character will take up only one byte in UTF8), but say we have 9 bits of data we want to encode. The two characters could be encoded like this (where x is a data bit): xxxxxxxx xx000000 Or this: 000000xx xxxxxxxx The latter is more efficient in index space since many more values will share the same leading bits. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org