Shalin Shekhar Mangar created SOLR-6530:
-------------------------------------------

             Summary: Commits under network partition can put any node in down 
state by any node
                 Key: SOLR-6530
                 URL: https://issues.apache.org/jira/browse/SOLR-6530
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
            Reporter: Shalin Shekhar Mangar
            Priority: Critical
             Fix For: 4.11, 5.0


Commits are executed by any node in SolrCloud i.e. they're not routed via the 
leader like other updates. 

# Suppose there's 1 collection, 1 shard, 2 replicas (A and B) and A is the 
leader
# Suppose a commit request is made to node B during a time where B cannot talk 
to A due to a partition for any reason (failing switch, heavy GC, whatever)
# B fails to distribute the commit to A (times out) and asks A to recover
# This was okay earlier because a leader just ignores recovery requests but 
with leader initiated recovery code, B puts A in the "down" state and A can 
never get out of that state.

tl;dr; During network partitions, if enough commit/optimize requests are sent 
to the cluster, all the nodes in the cluster will eventually be marked as 
"down".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to