[ 
https://issues.apache.org/jira/browse/SOLR-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439929#comment-16439929
 ] 

Shalin Shekhar Mangar commented on SOLR-12187:
----------------------------------------------

Hi Dat, the changes looks good to me. +1 to commit.

One improvement that we can make is to make the collection state watcher 
notifications more robust. Currently there is no exception handling in 
ZkStateReader.Notification thread, perhaps we should add some now that we rely 
so much on those notifications.

> Replica should watch clusterstate and unload itself if its entry is removed
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-12187
>                 URL: https://issues.apache.org/jira/browse/SOLR-12187
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12187.patch, SOLR-12187.patch, SOLR-12187.patch, 
> SOLR-12187.patch, SOLR-12187.patch
>
>
> With the introduction of autoscaling framework, we have seen an increase in 
> the number of issues related to the race condition between delete a replica 
> and other stuff.
> Case 1: DeleteReplicaCmd failed to send UNLOAD request to a replica, 
> therefore, forcefully remove its entry from clusterstate, but the replica 
> still function normally and be able to become a leader -> SOLR-12176
> Case 2:
>  * DeleteReplicaCmd enqueue a DELETECOREOP (without sending a request to 
> replica because the node is not live)
>  * The node start and the replica get loaded
>  * DELETECOREOP has not processed hence the replica still present in 
> clusterstate --> pass checkStateInZk
>  * DELETECOREOP is executed, DeleteReplicaCmd finished
>  ** result 1: the replica start recovering, finish it and publish itself as 
> ACTIVE --> state of the replica is ACTIVE
>  ** result 2: the replica throw an exception (probably: NPE) 
> --> state of the replica is DOWN, not join leader election



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to