Robert Muir created LUCENE-8025:
-----------------------------------
Summary: compute avgdl correctly for DOCS_ONLY
Key: LUCENE-8025
URL: https://issues.apache.org/jira/browse/LUCENE-8025
Project: Lucene - Core
Issue Type: Bug
Reporter: Robert Muir
Spinoff of LUCENE-8007:
If you omit term frequencies, we should score as if all tf values were 1. This
is the way it worked for e.g. ClassicSimilarity and you can understand how it
degrades.
However for sims such as BM25, we bail out on computing avg doclength (and just
return a bogus value of 1) today, screwing up stuff related to length
normalization too, which is separate.
Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq
(all postings have freq of 1, since you omitted them).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]