[
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220198#comment-16220198
]
Adrien Grand commented on LUCENE-8015:
--------------------------------------
I looked into it, this similarity ends up doing something like that:
{code}
double tfn = // non-decreasing function of tf
return (tfn * C1) * (C2 / (tfn + 1)); // C1 and C2 are some constants
{code}
The issue is that even if tfn increases, the result might decrease if {{tfn *
C1}} is rounded down and/or {{C2/(tfn + 1)}} is rounded up. One way to fix it
that I can think of is to make the value of tfn more discrete by doing eg.
{code}
diff --git
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
index aacd246..554d12f 100644
---
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
+++
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
@@ -108,7 +108,7 @@ public class DFRSimilarity extends SimilarityBase {
@Override
protected double score(BasicStats stats, double freq, double docLen) {
- double tfn = normalization.tfn(stats, freq, docLen);
+ double tfn = (float) normalization.tfn(stats, freq, docLen); // cast to
float on purpose to introduce gaps between consecutive values and prevent
double rounding errors to make the score decrease when tfn increases
return stats.getBoost() *
basicModel.score(stats, tfn) * afterEffect.score(stats, tfn);
}
{code}
Opinions?
> TestBasicModelIne.testRandomScoring failure
> -------------------------------------------
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
>
> reproduce with: ant test -Dtestcase=TestBasicModelIne
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu
> -Dtests.asserts=true -Dtests.file.encoding=UTF8
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]