Robert Muir created LUCENE-8011:
-----------------------------------

             Summary: Improve similarity explanations
                 Key: LUCENE-8011
                 URL: https://issues.apache.org/jira/browse/LUCENE-8011
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Robert Muir


LUCENE-7997 improves BM25 and Classic explains to better explain:

{noformat}
product of:
  2.2 = scaling factor, k1 + 1
  9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
    1.0 = n, number of documents containing term
    17927.0 = N, total number of documents with field
  0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) 
from:
    979.0 = freq, occurrences of term within document
    1.2 = k1, term saturation parameter
    0.75 = b, length normalization parameter
    1.0 = dl, length of field
    1.0 = avgdl, average length of field
{noformat}

Previously it was pretty cryptic and used confusing terminology like 
docCount/docFreq without explanation: 
{noformat}
product of:
  0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq 
+ 0.5)) from:
    449.0 = docFreq
    456.0 = docCount
  2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * 
fieldLength / avgFieldLength)) from:
    113659.0 = freq=113658
    1.2 = parameter k1
    0.75 = parameter b
    2300.5593 = avgFieldLength
    1048600.0 = fieldLength
{noformat}

We should fix other similarities too in the same way, they should be more 
practical.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to