Robert Muir created LUCENE-8025:
-----------------------------------

             Summary: compute avgdl correctly for DOCS_ONLY
                 Key: LUCENE-8025
                 URL: https://issues.apache.org/jira/browse/LUCENE-8025
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Robert Muir


Spinoff of LUCENE-8007:

If you omit term frequencies, we should score as if all tf values were 1. This 
is the way it worked for e.g. ClassicSimilarity and you can understand how it 
degrades. 

However for sims such as BM25, we bail out on computing avg doclength (and just 
return a bogus value of 1) today, screwing up stuff related to length 
normalization too, which is separate.

Instead of a bogus value, we should substitute sumDocFreq for sumTotalTermFreq 
(all postings have freq of 1, since you omitted them).




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to