[
https://issues.apache.org/jira/browse/LUCENE-8031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375631#comment-16375631
]
ASF subversion and git services commented on LUCENE-8031:
---------------------------------------------------------
Commit 29e5b8abcee8a566cc057b862ab99c5ffef13a76 in lucene-solr's branch
refs/heads/master from [~rcmuir]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=29e5b8a ]
LUCENE-8031: DOCS_ONLY fields set incorrect length norm
> DOCS_ONLY fields set incorrect length norms
> -------------------------------------------
>
> Key: LUCENE-8031
> URL: https://issues.apache.org/jira/browse/LUCENE-8031
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-8031.patch
>
>
> Term frequencies are discarded in the DOCS_ONLY case from the postings list
> but they still count against the length normalization, which looks like it
> may screw stuff up.
> I ran some quick experiments on LUCENE-8025, by encoding
> fieldInvertState.getUniqueTermCount() and it seemed worth fixing (e.g. 20% or
> 30% improvement potentially). Happy to do testing for real, if we want to fix.
> But this seems tricky, today you can downgrade to DOCS_ONLY on the fly, and
> its hard for me to think about that case (i think its generally screwed up
> besides this, but still).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]