sure thing! Here is the jira: https://issues.apache.org/jira/browse/SOLR-17976
On Tue, Oct 21, 2025 at 6:24 PM Chris Hostetter <[email protected]> wrote: > > Ugh. > > I think you are 100% correct, the merge logic *should* use the "shard > name" as the tie-breaker. > > As for how to fix this... > > The key hiccup is that a the concept of a "shard" predates the concept of > a "shard name" -- going back to before "SolrCloud" was an idea, and you > could send solr a "distributed search" request by specifying a ',' > seperated list of "shards", where each shard was a '|' seperated list of > "replica urls" > > much of the low level code still works that way, and only the higher level > code uses the cluster state to map a "shard name" to a "list of replica > urls. > > By the time the code gets low enough down to where/when a ShardDoc is > constructed, I don't think the "shard name" info is in scope. > > > Either way: Can you please file a bug report capturing this discussion ... > not sure how hard it will be to fix, but we should at least track it (even > if it probably has been broken for 10+ years) > > > > > : Date: Tue, 21 Oct 2025 17:16:09 -0500 > : From: Yue Yu <[email protected]> > : Reply-To: [email protected] > : To: [email protected] > : Subject: Solr 9.5 distributed search tie breaking logic is > non-deterministic > : > : Hello, > : In the mergeIds function of QueryComponent, this > : heap ShardFieldSortedHitQueue is used to order the ShardDoc. However, in > : the *lessThan* function: > : > : protected boolean lessThan(ShardDoc docA, ShardDoc docB) { > : // If these docs are from the same shard, then the relative order > : // is how they appeared in the response from that shard. > : if (Objects.equals(docA.shard, docB.shard)) { > : // if docA has a smaller position, it should be "larger" so it > : // comes before docB. > : // This will handle sorting by docid within the same shard > : > : // comment this out to test comparators. > : return !(docA.orderInShard < docB.orderInShard); > : } > : > : // run comparators > : final int n = comparators.length; > : int c = 0; > : for (int i = 0; i < n && c == 0; i++) { > : c = > : (fields[i].getReverse()) > : ? comparators[i].compare(docB, docA) > : : comparators[i].compare(docA, docB); > : } > : > : // solve tiebreaks by comparing shards (similar to using docid) > : // smaller docid's beat larger ids, so reverse the natural ordering > : if (c == 0) { > : c = -docA.shard.compareTo(docB.shard); > : } > : > : return c < 0; > : } > : > : The last tie-breaking logic is comparing ShardDoc.shard: > : > : // solve tiebreaks by comparing shards (similar to using docid) > : // smaller docid's beat larger ids, so reverse the natural ordering > : if (c == 0) { > : c = -docA.shard.compareTo(docB.shard); > : } > : > : > : Here ShardDoc.shard contains node ip as well as shard name, for example: > : http://127.0.0.1:8983/solr/my_collection_shard1_replica_n1 > : Consider this setup: 1 collection with 2 shard 2 replica running on a 2 > : nodes cluster. For the same query, we may have documents coming from the > : following core combinations: > : > : 1. http://node1_ip:8983/solr/my_collection_shard1_replica_n1 + > : http://node2_ip:8983/solr/my_collection_shard2_replica_n2 > : 2. http://node2_ip:8983/solr/my_collection_shard1_replica_n2 + > : http://node1_ip:8983/solr/my_collection_shard2_replica_n1 > : > : Hence the same request may have different document rankings when there > are > : documents from both shards with the same scores. This can get worse with > : more nodes/shards/replicas. > : I'm wondering if we should just use the shard name for tie breaking > instead > : (no node ip), if that's possible > : > : Thank you, > : Yue > : > > -Hoss > http://www.lucidworks.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
