The problem is more than worth it. The alternative is to remove the
optimization? I don't think being incorrect / adding leniency to tests
is a valid option at all. In general, if we dont apply a general fix,
it will just make more such optimizations harder: more jenkins
failures, more deltas in tests, just a bad direction.

I guess what i propose is something more like: change Scorer.score()
to return double, and use double precision internally in all scoring
(also similarity code).

But keep it a float in e.g. ScoreDoc/TopDocs: we just "export" that to
the user at the end. This is really best practice anyway, we shouldnt
be storing intermediate calculations as 32-bit floats. It would just
be a generalization of what DisjunctionSumScorer etc are already
doing.


On Fri, Oct 21, 2016 at 8:34 AM, Adrien Grand <jpou...@gmail.com> wrote:
> I suspect we could do something on the Scorer API indeed, eg. by giving
> scorers a way to expose the double value of the score. However it's not
> clear to me that this problem is worth making the Scorer API more complex?
>
> Le ven. 21 oct. 2016 à 12:37, Robert Muir <rcm...@gmail.com> a écrit :
>>
>> But maybe the old "trick" can still be used somehow: just means using
>> double precision internally to erase most differences? Maybe it means
>> a change to scorer api or whatever, but still I think its a good
>> practical solution (vs something more extreme like kahan summation). I
>> am sure it does not work if someone has like 500k boolean clauses or
>> for more extreme cases, but it prevents these problems for typical
>> cases like keyword searches.
>>
>>
>> On Fri, Oct 21, 2016 at 6:31 AM, Adrien Grand <jpou...@gmail.com> wrote:
>> > Le ven. 21 oct. 2016 à 12:20, Robert Muir <rcm...@gmail.com> a écrit :
>> >>
>> >> What changed?
>> >
>> >
>> > The issue here is ReqOptSumScorer, which computes the score of the MUST
>> > and
>> > SHOULD clauses separately and then sum them up. In that test case, in
>> > one
>> > case body:d is in the list of SHOULD clauses, and in the other case it
>> > is in
>> > the list of MUST clauses.
>> >
>> > For the same reason, "+a b", "+a +b" and "a +b" may return different
>> > scores
>> > on the same documents.
>> >
>> > I can undo the change if you think this is a blocker, but that would be
>> > disappointing as it would mean that we cannot do other exciting changes
>> > like
>> > flattening nested disjunctions since it would cause the same problem.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to