Yue Yu created SOLR-17976:
-----------------------------
Summary: Solr 9.5 distributed search tie breaking logic is
non-deterministic
Key: SOLR-17976
URL: https://issues.apache.org/jira/browse/SOLR-17976
Project: Solr
Issue Type: Bug
Components: SolrCloud
Reporter: Yue Yu
In the mergeIds function of QueryComponent, this heap ShardFieldSortedHitQueue
is used to order the ShardDoc. However, in the *lessThan* function:
{color:#cf8e6d}protected boolean {color}{color:#56a8f5}lessThan{color}(ShardDoc
docA, ShardDoc docB) {
{color:#7a7e85}// If these docs are from the same shard, then the relative order
{color}{color:#7a7e85} // is how they appeared in the response from that shard.
{color}{color:#7a7e85} {color}{color:#cf8e6d}if
{color}(Objects.equals(docA.{color:#c77dbb}shard{color},
docB.{color:#c77dbb}shard{color})) {
{color:#7a7e85}// if docA has a smaller position, it should be "larger" so it
{color}{color:#7a7e85} // comes before docB.
{color}{color:#7a7e85} // This will handle sorting by docid within the same
shard
{color}{color:#7a7e85}
{color}{color:#7a7e85} // comment this out to test comparators.
{color}{color:#7a7e85} {color}{color:#cf8e6d}return
{color}!(docA.{color:#c77dbb}orderInShard {color}<
docB.{color:#c77dbb}orderInShard{color});
}
{color:#7a7e85}// run comparators
{color}{color:#7a7e85} {color}{color:#cf8e6d}final int {color}n =
{color:#c77dbb}comparators{color}.{color:#c77dbb}length{color};
{color:#cf8e6d}int {color}c = {color:#2aacb8}0{color};
{color:#cf8e6d}for {color}({color:#cf8e6d}int {color}i =
{color:#2aacb8}0{color}; i < n && c == {color:#2aacb8}0{color}; i++) {
c =
({color:#c77dbb}fields{color}[i].getReverse())
? {color:#c77dbb}comparators{color}[i].compare(docB, docA)
: {color:#c77dbb}comparators{color}[i].compare(docA, docB);
}
{color:#7a7e85}// solve tiebreaks by comparing shards (similar to using docid)
{color}{color:#7a7e85} // smaller docid's beat larger ids, so reverse the
natural ordering
{color}{color:#7a7e85} {color}{color:#cf8e6d}if {color}(c ==
{color:#2aacb8}0{color}) {
c =
-docA.{color:#c77dbb}shard{color}.compareTo(docB.{color:#c77dbb}shard{color});
}
{color:#cf8e6d}return {color}c < {color:#2aacb8}0{color};
}The last tie-breaking logic is comparing ShardDoc.shard:
{color:#7a7e85}// solve tiebreaks by comparing shards (similar to using docid)
{color}{color:#7a7e85}// smaller docid's beat larger ids, so reverse the
natural ordering
{color}{color:#cf8e6d}if {color}(c == {color:#2aacb8}0{color}) {
c =
-docA.{color:#c77dbb}shard{color}.compareTo(docB.{color:#c77dbb}shard{color});
}
Here ShardDoc.shard contains node ip as well as shard name, for example:
[http://127.0.0.1:8983/solr/my_collection_shard1_replica_n1]
Consider this setup: 1 collection with 2 shard 2 replica running on a 2 nodes
cluster. For the same query, we may have documents coming from the following
core combinations:
# [http://node1_ip:8983/solr/my_collection_shard1_replica_n1] +
[http://node2_ip:8983/solr/my_collection_shard2_replica_n2]
# [http://node2_ip:8983/solr/my_collection_shard1_replica_n2] +
[http://node1_ip:8983/solr/my_collection_shard2_replica_n1]
Hence the same request may have different document rankings when there are
documents from both shards with the same scores. This can get worse with more
nodes/shards/replicas.
I'm wondering if we should just use the shard name for tie breaking instead (no
node ip), if that's possible
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]