[jira] [Created] (SOLR-17519) CloudSolrClient with Solr ClusterState can forget live nodes and then fail

David Smiley (Jira) Sun, 27 Oct 2024 14:53:18 -0700

David Smiley created SOLR-17519:
-----------------------------------

             Summary: CloudSolrClient with Solr ClusterState can forget live 
nodes and then fail
                 Key: SOLR-17519
                 URL: https://issues.apache.org/jira/browse/SOLR-17519
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud, SolrJ
            Reporter: David Smiley



When using CloudSolrClient with HTTP URLs to Solr for the cluster state:
If all live nodes disappear temporarily (hard cluster restart?), the client can 
permanently fail to talk to the cluster, and thus would need to be restarted to 
recover.

Credit [~ilan] on the dev list:
{quote}The current implementation removes non live nodes from the set of nodes 
to connect to. Getting the live nodes requires connecting to a specific node in 
the cluster that is therefore live when that happens. Worst case, if there is a 
single node up in the cluster, the client ends with a single node in its 
connection candidates list. For the issue to manifest, that Solr node then has 
to go down. Subsequently, even if other nodes are up, the client only has the 
address of a down node and can't connect.

The fix is not a big deal. Nodes initially passed as configuration to the 
client should never be removed from the set of candidate nodes to connect to, 
even if they are not live. Other live nodes could be added to that set (and 
removed from it if we so desire when they are no longer live) to increase 
resiliency in case the cluster does have live nodes but all initially 
configured nodes are not live. The design issue is treating the configured set 
of nodes to connect to and the set of live nodes as one thing.
{quote}
See org.apache.solr.client.solrj.impl.BaseHttpClusterStateProvider



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SOLR-17519) CloudSolrClient with Solr ClusterState can forget live nodes and then fail

Reply via email to