[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

Joel Bernstein (JIRA) Mon, 15 Oct 2012 13:41:05 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476422#comment-13476422
 ]


Joel Bernstein commented on SOLR-3939:
--------------------------------------

No sure if this helps. Here is stack trace from my second solr instance. This 
is the instance that would be the leader after the leader core was unloaded on 
the first instance.

SEVERE: There was a problem finding the leader in 
zk:org.apache.solr.common.SolrException: Could not get leader props
        at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:709)
        at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:673)
        at 
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1070)
        at 
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:273)
        at org.apache.solr.cloud.ZkController.access$100(ZkController.java:82)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:190)
        at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
        at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/collection1/leaders/shard1
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
        at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
        at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:687)
        ... 10 more

Oct 15, 2012 3:39:18 PM org.apache.solr.common.SolrException log
SEVERE: :org.apache.solr.common.SolrException: There was a problem finding the 
leader in zk
        at 
org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1080)
        at 
org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:273)
        at org.apache.solr.cloud.ZkController.access$100(ZkController.java:82)
        at org.apache.solr.cloud.ZkController$1.command(ZkController.java:190)
        at 
org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
        at 
org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
        at 
org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
                
> Solr Cloud recovery and leader election when unloading leader core
> ------------------------------------------------------------------
>
>                 Key: SOLR-3939
>                 URL: https://issues.apache.org/jira/browse/SOLR-3939
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0-BETA, 4.0
>            Reporter: Joel Bernstein
>            Assignee: Mark Miller
>              Labels: 4.0.1_Candidate
>             Fix For: 4.1, 5.0
>
>         Attachments: SOLR-3939.patch
>
>
> When a leader core is unloaded using the core admin api, the followers in the 
> shard go into recovery but do not come out. Leader election doesn't take 
> place and the shard goes down.
> This effects the ability to move a micro-shard from one Solr instance to 
> another Solr instance.
> The problem does not occur 100% of the time but a large % of the time. 
> To setup a test, startup Solr Cloud with a single shard. Add cores to that 
> shard as replicas using core admin. Then unload the leader core using core 
> admin. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3939) Solr Cloud recovery and leader election when unloading leader core

Reply via email to