[ https://issues.apache.org/jira/browse/LUCENE-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860697#comment-16860697 ]
Adrien Grand commented on LUCENE-8829: -------------------------------------- bq. You mean that use docID based tie breaking iff setShardIndex = false && all docs have shardIndex as -1? I mean ordering on score or sort fields, then shardIndex, then docID all the time. In the case that we are interested in, all documents will have a shardIndex of -1 so this would be equivalent to sorting on score or sort fields and then docID? I don't think we need to proactively check all TopDocs. That said, it would probably be a bug if some hits have a shardIndex and others don't (value == -1) so maybe we could check this on the ScoreDocs that we are seeing instead of the existing check that we have today when setShardIndex==false and shardIndex==-1? > TopDocs#Merge is Tightly Coupled To Number Of Collectors Involved > ----------------------------------------------------------------- > > Key: LUCENE-8829 > URL: https://issues.apache.org/jira/browse/LUCENE-8829 > Project: Lucene - Core > Issue Type: Bug > Reporter: Atri Sharma > Priority: Major > Attachments: LUCENE-8829.patch, LUCENE-8829.patch, LUCENE-8829.patch > > > While investigating LUCENE-8819, I understood that TopDocs#merge's order of > results are indirectly dependent on the number of collectors involved in the > merge. This is troubling because 1) The number of collectors involved in a > merge are cost based and directly dependent on the number of slices created > for the parallel searcher case. 2) TopN hits code path will invoke merge with > a single Collector, so essentially, doing the same TopN query with single > threaded and parallel threaded searcher will invoke different order of > results, which is a bad invariant that breaks. > > The reason why this happens is because of the subtle way TopDocs#merge sets > shardIndex in the ScoreDoc population during populating the priority queue > used for merging. ShardIndex is essentially set to the ordinal of the > collector which generates the hit. This means that the shardIndex is > dependent on the number of collectors, even for the same set of hits. > > In case of no sort order specified, shardIndex is used for tie breaking when > scores are equal. This translates to different orders for same hits with > different shardIndices. > > I propose that we remove shardIndex from the default tie breaking mechanism > and replace it with docID. DocID order is the de facto that is expected > during collection, so it might make sense to use the same factor during tie > breaking when scores are the same. > > CC: [~ivera] -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org