Re: Solr 9.5 distributed search tie breaking logic is non-deterministic

Yue Yu Wed, 22 Oct 2025 11:10:46 -0700

sure thing! Here is the jira:
https://issues.apache.org/jira/browse/SOLR-17976


On Tue, Oct 21, 2025 at 6:24 PM Chris Hostetter <[email protected]>
wrote:

>
> Ugh.
>
> I think you are 100% correct, the merge logic *should* use the "shard
> name" as the tie-breaker.
>
> As for how to fix this...
>
> The key hiccup is that a the concept of a "shard" predates the concept of
> a "shard name" -- going back to before "SolrCloud" was an idea, and you
> could send solr a "distributed search" request by specifying a ','
> seperated list of "shards", where each shard was a '|' seperated list of
> "replica urls"
>
> much of the low level code still works that way, and only the higher level
> code uses the cluster state to map a "shard name" to a "list of replica
> urls.
>
> By the time the code gets low enough down to where/when a ShardDoc is
> constructed, I don't think the "shard name" info is in scope.
>
>
> Either way: Can you please file a bug report capturing this discussion ...
> not sure how hard it will be to fix, but we should at least track it (even
> if it probably has been broken for 10+ years)
>
>
>
>
> : Date: Tue, 21 Oct 2025 17:16:09 -0500
> : From: Yue Yu <[email protected]>
> : Reply-To: [email protected]
> : To: [email protected]
> : Subject: Solr 9.5 distributed search tie breaking logic is
> non-deterministic
> :
> : Hello,
> : In the mergeIds function of QueryComponent, this
> : heap ShardFieldSortedHitQueue is used to order the ShardDoc. However, in
> : the *lessThan* function:
> :
> : protected boolean lessThan(ShardDoc docA, ShardDoc docB) {
> :   // If these docs are from the same shard, then the relative order
> :   // is how they appeared in the response from that shard.
> :   if (Objects.equals(docA.shard, docB.shard)) {
> :     // if docA has a smaller position, it should be "larger" so it
> :     // comes before docB.
> :     // This will handle sorting by docid within the same shard
> :
> :     // comment this out to test comparators.
> :     return !(docA.orderInShard < docB.orderInShard);
> :   }
> :
> :   // run comparators
> :   final int n = comparators.length;
> :   int c = 0;
> :   for (int i = 0; i < n && c == 0; i++) {
> :     c =
> :         (fields[i].getReverse())
> :             ? comparators[i].compare(docB, docA)
> :             : comparators[i].compare(docA, docB);
> :   }
> :
> :   // solve tiebreaks by comparing shards (similar to using docid)
> :   // smaller docid's beat larger ids, so reverse the natural ordering
> :   if (c == 0) {
> :     c = -docA.shard.compareTo(docB.shard);
> :   }
> :
> :   return c < 0;
> : }
> :
> : The last tie-breaking logic is comparing ShardDoc.shard:
> :
> : // solve tiebreaks by comparing shards (similar to using docid)
> : // smaller docid's beat larger ids, so reverse the natural ordering
> : if (c == 0) {
> :   c = -docA.shard.compareTo(docB.shard);
> : }
> :
> :
> : Here ShardDoc.shard contains node ip as well as shard name, for example:
> : http://127.0.0.1:8983/solr/my_collection_shard1_replica_n1
> : Consider this setup: 1 collection with 2 shard 2 replica running on a 2
> : nodes cluster. For the same query, we may have documents coming from the
> : following core combinations:
> :
> :    1. http://node1_ip:8983/solr/my_collection_shard1_replica_n1 +
> :    http://node2_ip:8983/solr/my_collection_shard2_replica_n2
> :    2. http://node2_ip:8983/solr/my_collection_shard1_replica_n2 +
> :    http://node1_ip:8983/solr/my_collection_shard2_replica_n1
> :
> : Hence the same request may have different document rankings when there
> are
> : documents from both shards with the same scores. This can get worse with
> : more nodes/shards/replicas.
> : I'm wondering if we should just use the shard name for tie breaking
> instead
> : (no node ip), if that's possible
> :
> : Thank you,
> : Yue
> :
>
> -Hoss
> http://www.lucidworks.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Solr 9.5 distributed search tie breaking logic is non-deterministic

Reply via email to