Cao Manh Dat created SOLR-9945:
----------------------------------

             Summary: LIR should check the node is recovering before bring it 
down
                 Key: SOLR-9945
                 URL: https://issues.apache.org/jira/browse/SOLR-9945
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: Cao Manh Dat


When a node is recovering, the leader can meet an exception when trying to send 
an update to the buffering node. So the leader will try running LIR process: 
first set the node's state to DOWN, then send recovery OP to the node.
In the same time, PrepRecoveryOp will make the leader wait for a very long time 
to see the node's state is RECOVERING. 
This scenario can easily be achieved by using this test
{code}
String collection = "collection2";
CollectionAdminRequest
    .createCollection(collection, "config", 1, 2)
    .setMaxShardsPerNode(1)
    .process(cluster.getSolrClient());
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, 
cluster.getSolrClient().getZkStateReader(),
    false, true, 30);
CloudSolrClient cloudClient = cluster.getSolrClient();

DocCollection docCollection = 
cloudClient.getZkStateReader().getClusterState().getCollection(collection);
Slice slice = docCollection.getSlice("shard1");
Replica replicaNode = slice.getReplicas(replica -> replica != 
slice.getLeader()).get(0);
JettySolrRunner replicaRunner = cluster.getReplicaJetty(replicaNode);

new UpdateRequest()
    .add(sdoc("id", "1"))
    .process(cloudClient, collection);
ChaosMonkey.stop(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "2"))
    .process(cloudClient, collection);
ChaosMonkey.start(replicaRunner);
new UpdateRequest()
    .add(sdoc("id", "3"))
    .process(cloudClient, collection);
AbstractDistribZkTestBase.waitForRecoveriesToFinish(collection, 
cluster.getSolrClient().getZkStateReader(),
    false, true, 60);
CollectionAdminRequest
    .deleteCollection(collection)
    .process(cloudClient);  
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to