tomglk commented on a change in pull request #151:
URL: https://github.com/apache/solr/pull/151#discussion_r640664451
##########
File path:
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
##########
@@ -999,20 +1013,34 @@ protected void mergeIds(ResponseBuilder rb, ShardRequest
sreq) {
shardDoc.sortFieldValues = unmarshalledSortFieldValues;
- queue.insertWithOverflow(shardDoc);
+ if(reRankQueue != null && docCounter++ <= reRankDocsSize) {
+ ShardDoc droppedShardDoc =
reRankQueue.insertWithOverflow(shardDoc);
+ // FIXME: Only works if the original request does not sort by
score
Review comment:
The current solution only works if the original sort did not sort by
score.
This is because the score of the documents is overwritten during the
reRanking.
The reRankDocs-param which specifies the amount of docs that should be
reRanked, is used per shard, but also has to be applied while combining the
results.
Therefore, each shard response may contain documents which were reRanked on
the shard, but should not be reRanked in the combined result.
Example:
reRankDocs = 2
**shard1:** doc_1 (score 200, reRanked), doc_2 (score 100, reRanked,
original score 40), doc_3 (score 30)
**shard2:** doc_4 (score 300, reRanked), doc_5 (score 50, reRanked, original
score 25), doc_6 (score 20)
**expected result:**
doc_4 (score 300, reRanked), doc_1 (score 200, reRanked), doc_2, _doc_3,
doc_5_, doc_6
**actual result:**
doc_4 (score 300, reRanked), doc_1 (score 200, reRanked), doc_2, _doc_5,
doc_3_, doc_6
The problem is, that we compare the score after reRanking (doc 2 & 5) with
the score before reRanking (doc 3 & 6).
We have no access to the score before reRanking at this point and I
currently see no possibility to retrieve it again.
Depending on the used reRanking algorithm, these scores may differ greatly,
which results in an incorrect ordering of the results starting at position >
reRankDocs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]