Robert Muir created LUCENE-8011:
-----------------------------------
Summary: Improve similarity explanations
Key: LUCENE-8011
URL: https://issues.apache.org/jira/browse/LUCENE-8011
Project: Lucene - Core
Issue Type: Improvement
Reporter: Robert Muir
LUCENE-7997 improves BM25 and Classic explains to better explain:
{noformat}
product of:
2.2 = scaling factor, k1 + 1
9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:
1.0 = n, number of documents containing term
17927.0 = N, total number of documents with field
0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl))
from:
979.0 = freq, occurrences of term within document
1.2 = k1, term saturation parameter
0.75 = b, length normalization parameter
1.0 = dl, length of field
1.0 = avgdl, average length of field
{noformat}
Previously it was pretty cryptic and used confusing terminology like
docCount/docFreq without explanation:
{noformat}
product of:
0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq
+ 0.5)) from:
449.0 = docFreq
456.0 = docCount
2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
fieldLength / avgFieldLength)) from:
113659.0 = freq=113658
1.2 = parameter k1
0.75 = parameter b
2300.5593 = avgFieldLength
1048600.0 = fieldLength
{noformat}
We should fix other similarities too in the same way, they should be more
practical.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]