[ https://issues.apache.org/jira/browse/LUCENE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16274115#comment-16274115 ]
ASF GitHub Bot commented on LUCENE-8011: ---------------------------------------- Github user jpountz commented on the issue: https://github.com/apache/lucene-solr/pull/280 Thanks @mayya-sharipova, this looks like great progress to me. Maybe we could go even further and do the following: - in the Axiomatic similarity, add abstract methods to allow sub classes to explain how tf, ln, etc. are computed, - make BasicModel.explain abstract to force sub classes to have their own explanation and include the formula, - make sure that our own sub classes of SimilarityBase extend explain (the one that returns an explanation) and include the formula in the explanation. For the record, there is not too much concern to have about backward compatibility since most of those classes (eg. Axiomatic, BasicModel) are very expert classes and this changes targets master. > Improve similarity explanations > ------------------------------- > > Key: LUCENE-8011 > URL: https://issues.apache.org/jira/browse/LUCENE-8011 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Labels: newdev > > LUCENE-7997 improves BM25 and Classic explains to better explain: > {noformat} > product of: > 2.2 = scaling factor, k1 + 1 > 9.388654 = idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from: > 1.0 = n, number of documents containing term > 17927.0 = N, total number of documents with field > 0.9987758 = tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) > from: > 979.0 = freq, occurrences of term within document > 1.2 = k1, term saturation parameter > 0.75 = b, length normalization parameter > 1.0 = dl, length of field > 1.0 = avgdl, average length of field > {noformat} > Previously it was pretty cryptic and used confusing terminology like > docCount/docFreq without explanation: > {noformat} > product of: > 0.016547536 = idf, computed as log(1 + (docCount - docFreq + 0.5) / > (docFreq + 0.5)) from: > 449.0 = docFreq > 456.0 = docCount > 2.1920826 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b > * fieldLength / avgFieldLength)) from: > 113659.0 = freq=113658 > 1.2 = parameter k1 > 0.75 = parameter b > 2300.5593 = avgFieldLength > 1048600.0 = fieldLength > {noformat} > We should fix other similarities too in the same way, they should be more > practical. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org