[
https://issues.apache.org/jira/browse/LUCENE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16278354#comment-16278354
]
Adrien Grand commented on LUCENE-8015:
--------------------------------------
I think our best option is to specialize some combinations. We should be able
to specialize basic models G, IF, I(n) and I(ne) with after effects B, L and
NoAfterEffect and make them pass tests. For instance, I tested out this
specialization of model G and after effect L to make sure it actually passes
the tests:
{code}
/** BasicModel G + AfterEffect L */
public class DFRSimilarityGL extends SimilarityBase {
private final Normalization normalization;
public DFRSimilarityGL(Normalization normalization) {
this.normalization = Objects.requireNonNull(normalization);
}
@Override
protected double score(BasicStats stats, double freq, double docLen) {
double tfn = normalization.tfn(stats, freq, docLen);
// approximation only holds true when F << N, so we use lambda = F / (N + F)
double F = stats.getTotalTermFreq() + 1;
double N = stats.getNumberOfDocuments();
double lambda = F / (N + F);
// -log(1 / (lambda + 1)) -> log(lambda + 1)
double A = log2(lambda + 1);
double B = log2((1 + lambda) / lambda);
// basic model G uses (A + B * tfn)
// after effect L takes the result and divides it by (1 + tfn)
// so in the end we have (A + B * tfn) / (1 + tfn)
// which we change to B - (B - A) / (1 + tfn) to reduce floating-point
accuracy issues
// (since tfn appears only once it is guaranteed to be non decreasing with
tfn
return B - (B - A) / (1 + tfn);
}
@Override
public String toString() {
return "DFR GL" + normalization.toString();
}
}
{code}
> TestBasicModelIne.testRandomScoring failure
> -------------------------------------------
>
> Key: LUCENE-8015
> URL: https://issues.apache.org/jira/browse/LUCENE-8015
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Attachments: LUCENE-8015-test.patch, LUCENE-8015_test_fangs.patch
>
>
> reproduce with: ant test -Dtestcase=TestBasicModelIne
> -Dtests.method=testRandomScoring -Dtests.seed=86E85958B1183E93
> -Dtests.slow=true -Dtests.locale=vi-VN -Dtests.timezone=Pacific/Tongatapu
> -Dtests.asserts=true -Dtests.file.encoding=UTF8
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]