Jonathan Hoag created LUCENE-5343:
-------------------------------------
Summary: Potential floating point precision error in
ConjuncionScorer.score()
Key: LUCENE-5343
URL: https://issues.apache.org/jira/browse/LUCENE-5343
Project: Lucene - Core
Issue Type: Bug
Components: core/query/scoring
Affects Versions: 4.5, 3.6.1, 3.6.3, 4.5.1
Reporter: Jonathan Hoag
I have been investigating an issue with document scoring and found that the
ConjunctionScorer implements the score method in a way that can cause floating
point precision rounding issues. I noticed in some of my test cases that
documents that have not been merged/optimized (I'm not sure of the correct
terminology, they have a docNum of 0) have scorers added in a different order
than optimized documents. Using a float to maintain the sum of scores
introduces the potential for floating point precision errors. In turn this
causes the score that is returned from the ConjunctionScorer to be different
for some merged/unmerged documents that should have identical scores.
Example:
float sum1 = 0.0061859353f + 0.0061859353f + 0.0030929677f + 0.0030929677f +
0.0030929677f + 0.5010608f + 0.0061859353f;
float sum2 = 0.0061859353f + 0.0061859353f + 0.0061859353f + 0.0030929677f +
0.0030929677f + 0.0030929677f + 0.5010608f;
sum1 == 0.5288975; // Incorrect
sum2 == 0.52889746; // Correct
I also noticed that there is a comment in the 4.5.1 version of Lucene to the
effect of:
// TODO: sum into a double and cast to float if we ever send required clauses
to BS1
Is there a reason that this has not been implemented yet?
public float score() throws IOException {
double sum = 0.0d;
for (int i = 0; i < scorers.length; i++) {
sum += scorers[i].score();
}
return (float)sum;
}
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]