Cao Manh Dat created SOLR-12187:
-----------------------------------

             Summary: Replica should watch clusterstate and unload itself if 
its entry is removed
                 Key: SOLR-12187
                 URL: https://issues.apache.org/jira/browse/SOLR-12187
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Cao Manh Dat
            Assignee: Cao Manh Dat


With the introduction of autoscaling framework, we have seen an increase in the 
number of issues related to the race condition between delete a replica and 
other stuff.

Case 1: DeleteReplicaCmd failed to send UNLOAD request to a replica, therefore, 
forcefully remove its entry from clusterstate, but the replica still function 
normally and be able to become a leader -> SOLR-12176
Case 2:
 * DeleteReplicaCmd enqueue a DELETECOREOP (without sending a request to 
replica because the node is not live)
 * The node start and the replica get loaded
 * DELETECOREOP has not processed hence the replica still present in 
clusterstate --> pass checkStateInZk
 * DELETECOREOP is executed, DeleteReplicaCmd finished
 ** result 1: the replica start recovering, finish it and publish itself as 
ACTIVE --> state of the replica is ACTIVE
 ** result 2: the replica throw an exception (probably: NPE) 
--> state of the replica is DOWN, not join leader election



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to