Robert Muir created LUCENE-8031:
-----------------------------------

             Summary: DOCS_ONLY fields set incorrect length norms
                 Key: LUCENE-8031
                 URL: https://issues.apache.org/jira/browse/LUCENE-8031
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir
            Priority: Major


Term frequencies are discarded in the DOCS_ONLY case from the postings list but 
they still count against the length normalization, which looks like it may 
screw stuff up.

I ran some quick experiments on LUCENE-8025, by encoding 
fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or 
30% improvement potentially). Happy to do testing for real, if we want to fix.

But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and its 
hard for me to think about that case (i think its generally screwed up besides 
this, but still).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to