Hi Adrien,

Thank you for the great explanation!

Koji


On 2017/08/22 19:36, Adrien Grand wrote:
Yes, LUCENE-7730 is the issue.

Le mar. 22 août 2017 à 12:00, Koji Sekiguchi <[email protected] <mailto:[email protected]>> a écrit :

    I thought LUCENE-6819 removed the single byte float as well because to 
describe the background of
    the ticket, you mentioned it was poor precision. So I thought the ticket 
solved it (from the
    context).

    So the field length is still stored in the single byte and the precision of 
the float still not
    good? And the point of the LUCENE-6819 is that we can set more precise 
boost value if we want
    because it no longer depends on the poor precision single byte for field 
length?


We still use a single byte in order to store the norm. The difference is that before we used to store ${index-boost} * ${length-norm}. Because index-boosts could take any positive value, we could not make any assumptions about this quantity that could have helped make storage more efficient. More concretely, length-norm was always between 0 and 1, so if you did not use index boosts like most Lucene users, then the final normalization factor would be in 0-1 as well. Yet only 125 out of the 256 bytes that the SmallFloat encoding that we used represent values between 0 and 1. So this feature was trading accuracy of the length normalization factor in favor of a feature that was only used by a minority and could be easily replaced by a doc-value field.

We actually went a bit further and started storing the document length rather than the precomputed length-normalization factor in the norms field. It is easier to reason about since we know all values are integers, positive, and that we want to have better accuracy for lower values. This allowed to encode lengths accurately up to 40, while the previous encoding that we used considered 3 and 4 to be the same lengths for instance. Then accuracy degrades progressively as you can notice on the LUCENE-7730 ticket.



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to