[ 
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220198#comment-16220198
 ] 

Adrien Grand commented on LUCENE-8015:
--------------------------------------

I looked into it, this similarity ends up doing something like that:

{code}
double tfn = // non-decreasing function of tf
return (tfn * C1) * (C2 / (tfn + 1)); // C1 and C2 are some constants
{code}

The issue is that even if tfn increases, the result might decrease if {{tfn * 
C1}} is rounded down and/or {{C2/(tfn + 1)}} is rounded up. One way to fix it 
that I can think of is to make the value of tfn more discrete by doing eg.

{code}
diff --git 
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java 
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
index aacd246..554d12f 100644
--- 
a/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
+++ 
b/lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java
@@ -108,7 +108,7 @@ public class DFRSimilarity extends SimilarityBase {
 
   @Override
   protected double score(BasicStats stats, double freq, double docLen) {
-    double tfn = normalization.tfn(stats, freq, docLen);
+    double tfn = (float) normalization.tfn(stats, freq, docLen); // cast to 
float on purpose to introduce gaps between consecutive values and prevent 
double rounding errors to make the score decrease when tfn increases
     return stats.getBoost() *
         basicModel.score(stats, tfn) * afterEffect.score(stats, tfn);
   }

{code}

Opinions?

> TestBasicModelIne.testRandomScoring failure
> -------------------------------------------
>
>                 Key: LUCENE-8015
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8015
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>
> reproduce with: ant test  -Dtestcase=TestBasicModelIne 
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93 
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu 
> -Dtests.asserts=true -Dtests.file.encoding=UTF8



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to