[
https://issues.apache.org/jira/browse/LUCENE-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755219#action_12755219
]
Doron Cohen commented on LUCENE-1908:
-------------------------------------
{quote}
The rationale behind the coarseness of the norms is that since the accuracy of
search engines in retrieving the documents that the user really wants is so
poor, only big differences matter. (It's not just poor "recall" against a
given query, but the difficulty that the user experiences in formulating a
proper query to express what they're really looking for in the first place.)
Doug wrote at least once about this some years back, but I haven't been
able to track down the post.
{quote}
Thanks! I too failed to find that post.
I like the part about users difficulty to express their information need in the
query.
So I am updating like this:
{noformat}
However the resulted norm value is encoded as a single byte before being
stored. At search time, the norm byte value is read from the index directory
and decoded back to a float norm value. This encoding/decoding, while reducing
index size, comes with the price of precision loss - it is not guaranteed that
decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.75.
Compression of norm values to a single byte saves memory at search time,
because once a field is referenced at search time, its norms - for all
documents - are maintained in memory.
The rationale supporting such lossy compression of norm values is that
given the difficulty (and inaccuracy) of users to express their true
information
need by a query, only big differences matter.
Last, note that search time is too late to modify this norm part of scoring,
e.g. by using a different Similarity for search.
{noformat}
> Similarity javadocs for scoring function to relate more tightly to scoring
> models in effect
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-1908
> URL: https://issues.apache.org/jira/browse/LUCENE-1908
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1908.patch, LUCENE-1908.patch, LUCENE-1908.patch,
> LUCENE-1908.patch
>
>
> See discussion in the related issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]