Cao Manh Dat created SOLR-12187:
-----------------------------------
Summary: Replica should watch clusterstate and unload itself if
its entry is removed
Key: SOLR-12187
URL: https://issues.apache.org/jira/browse/SOLR-12187
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Cao Manh Dat
Assignee: Cao Manh Dat
With the introduction of autoscaling framework, we have seen an increase in the
number of issues related to the race condition between delete a replica and
other stuff.
Case 1: DeleteReplicaCmd failed to send UNLOAD request to a replica, therefore,
forcefully remove its entry from clusterstate, but the replica still function
normally and be able to become a leader -> SOLR-12176
Case 2:
* DeleteReplicaCmd enqueue a DELETECOREOP (without sending a request to
replica because the node is not live)
* The node start and the replica get loaded
* DELETECOREOP has not processed hence the replica still present in
clusterstate --> pass checkStateInZk
* DELETECOREOP is executed, DeleteReplicaCmd finished
** result 1: the replica start recovering, finish it and publish itself as
ACTIVE --> state of the replica is ACTIVE
** result 2: the replica throw an exception (probably: NPE)
--> state of the replica is DOWN, not join leader election
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]