Hi!

I see where the problem lies but I can't find a way to solve it.

First feature: one of the fields must be scored only once: if a document
matches this field several times (the values are different), the score is
counted only the first time.
A map is given as an argument to the CustomScoreQuery, it registers that
the document has been scored once and that all subsequent matches must
result in a score of 0.

A second feature: another CustomScoreQuery multiplies each sub-score by a
factor based on the date of the document: document A that matches better
than document B but is older may receive a lower final score than document
B.

The calculation of the final total score (only once score field + date)
gives the expected correct result (the Explanation shows it) but in some
cases - because of the date correction - the ranking is wrong, a document
with a lower final total score is ranked before a document with a higher
score.

In scoreDoc.toString(), the score=... part and the fields=[score, ...] part
do not have the same score value, that of fields=[] is
smaller: the difference is equal to the score of the "only once score"
field multiplied by the date factor.
This fields part represents the Sort requested from the IndexSearcher.
This difference exists for all hits, whether the document has the "only
once score" field once or more.
Why this difference?
When debugging, I see that the IndexSearcher search enters at least a
second time into the "only once score" CustomScoreQuery and that it is the
0 score that is finally retained since the record that the score has
already been given was made for each match.

I can't figure out how to solve this problem, I'm not sure if there is a
solution since a score depends on a previous score; I've tried the
FunctionQuery route without success but I'm not sure that technique applies
here either.

Am I making a mistake somewhere? I can only see re-sorting all the hits at
the end, apart from Lucene, as a workaround.

I would be very happy if someone could point me to a better solution.


Thanks in advance. Claude Lepère

On 2022/02/21 09:56:18 Claude Lepere wrote:
> Hi! I have a question with sorting, I don’t understand why in a test a hit
> with a lower score is ranked before hits with higher scores.
>
> I am using Lucene 5.2.1.
>
>
>
> Two CustomScoreQuery subqueries on two fields, subquery 1 and subquery 2,
> and two test cases:
>
> case 1: the two calculated custom scores are multiplied by the same factor
> depending on the date of the match at the end of the customScore method of
> CustomScoreProvider
>
> case 2: the two calculated custom scores are *not* multiplied by the date
> factor.
>
>
>
> All tests with the same Sort, by score then by date.
>
>
>
> Case 1: with date factor:
>
>
>
> Test 1: subquery 1 only:
>
> two hits, doc A (date A) gets the score A1, doc B (date B) gets the score
> B1: score A1 > score B1, date A < date B, and doc A is ranked before doc B
>
> Explanation:
>
> doc A score A1 shardIndex=0 fields=[score A1, date A]
>
> doc B score B1 shardIndex=0 fields=[score B1, date B]
>
>
>
> That's correct.
>
>
>
>
>
> Test 2: MUST query subquery 1, subquery 2:
>
> the two same docs match: doc A (date A) gets the score A2, doc B (date B)
> gets the score B2: score A2 *<* score B2, date A < date B, and *doc A is
> ranked before doc B*
>
> Explanation:
>
> doc A score A2 shardIndex=0 fields=[score A1, date A]
>
> doc B score B2 shardIndex=0 fields=[score B1, date B]
>
>
>
> *doc A is ranked before doc B although score A2 < score B2 and sorting
> should use scores A2 and B2, not A1 and B1.*
>
>
>
>
>
>
>
> Case 2: without date factor:
>
>
>
> Test 1: subquery 1 only:
>
> doc A (date A) gets the score A1, doc B (date B) gets the score B1: score
> A1 > score B1, date A < date B, and doc A is ranked before doc B
>
> Explanation:
>
> doc A score A1 shardIndex=0 fields=[score A1, date A]
>
> doc B score B1 shardIndex=0 fields=[score B1, date B]
>
>
>
>
>
> Test 2: MUST query subquery 1, subquery 2:
>
> the two same docs match: doc A (date A) gets the score A2, doc B (date B)
> gets the score B2: score A2 *>* score B2, date A < date B, and doc A is
> ranked before doc B
>
> Explanation:
>
> doc A score A2 shardIndex=0 fields=[score A1, date A]
>
> doc B score B2 shardIndex=0 fields=[score B1, date B]
>
>
>
> Using score A1 here works: without the date factor, all the hits of test 2
> match subquery 2 in the same way and they get the same sub-score: the
> explanation shows in this case that the score = field[0] score + the
common
> sub-score of the hits, therefore the sorting is the same by current score
> as by field[0] score.
>
>
>
> But, with the date factor, this is no longer true, the sort [Score, date]
> should use the current scores of test 2 and not those of test 1.
>
>
>
>
>
> Please, could someone enlighten me? Do I make a mistake somewhere?
>
>
>
> Claude Lepère
>
> <
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>
> Virus-free.
> www.avg.com
> <
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail
>
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>

Reply via email to