The problem is more than worth it. The alternative is to remove the optimization? I don't think being incorrect / adding leniency to tests is a valid option at all. In general, if we dont apply a general fix, it will just make more such optimizations harder: more jenkins failures, more deltas in tests, just a bad direction.
I guess what i propose is something more like: change Scorer.score() to return double, and use double precision internally in all scoring (also similarity code). But keep it a float in e.g. ScoreDoc/TopDocs: we just "export" that to the user at the end. This is really best practice anyway, we shouldnt be storing intermediate calculations as 32-bit floats. It would just be a generalization of what DisjunctionSumScorer etc are already doing. On Fri, Oct 21, 2016 at 8:34 AM, Adrien Grand <jpou...@gmail.com> wrote: > I suspect we could do something on the Scorer API indeed, eg. by giving > scorers a way to expose the double value of the score. However it's not > clear to me that this problem is worth making the Scorer API more complex? > > Le ven. 21 oct. 2016 à 12:37, Robert Muir <rcm...@gmail.com> a écrit : >> >> But maybe the old "trick" can still be used somehow: just means using >> double precision internally to erase most differences? Maybe it means >> a change to scorer api or whatever, but still I think its a good >> practical solution (vs something more extreme like kahan summation). I >> am sure it does not work if someone has like 500k boolean clauses or >> for more extreme cases, but it prevents these problems for typical >> cases like keyword searches. >> >> >> On Fri, Oct 21, 2016 at 6:31 AM, Adrien Grand <jpou...@gmail.com> wrote: >> > Le ven. 21 oct. 2016 à 12:20, Robert Muir <rcm...@gmail.com> a écrit : >> >> >> >> What changed? >> > >> > >> > The issue here is ReqOptSumScorer, which computes the score of the MUST >> > and >> > SHOULD clauses separately and then sum them up. In that test case, in >> > one >> > case body:d is in the list of SHOULD clauses, and in the other case it >> > is in >> > the list of MUST clauses. >> > >> > For the same reason, "+a b", "+a +b" and "a +b" may return different >> > scores >> > on the same documents. >> > >> > I can undo the change if you think this is a blocker, but that would be >> > disappointing as it would mean that we cannot do other exciting changes >> > like >> > flattening nested disjunctions since it would cause the same problem. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org