[ 
https://issues.apache.org/jira/browse/SOLR-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502629#comment-13502629
 ] 

Po Rui commented on SOLR-3993:
------------------------------

TO Werner :
|2) start only ONE core that has NOT been the former leader.
……
|Result: loop in Running recovery...

this only one core will be the leader finally. this will take a long time cause 
the waitForReplicasComeup() will end for timeout. this core will 
cancelRecovery() after he is the leader. it wouldn't run in loop cause 
cancelRecovery() will stop recovery thread. But before this core become the 
leader the recovery thread try to connect to old leader and do recovery. 
recovery thread will throw a lot of exceptions along this process duo to the 
dead leader. it's fine since the recovery thread will die too 

|Second Problem (might be the at least similar):
live zookeeper is a precondition of solr cluster and also the zookeeper service 
 must ready before start a solr instance. you'd better use zookeeper service 
separately
                
> SolrCloud leader election on single node stucks the initialization
> ------------------------------------------------------------------
>
>                 Key: SOLR-3993
>                 URL: https://issues.apache.org/jira/browse/SOLR-3993
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>         Environment: Windows 7, Tomcat 6
>            Reporter: Alexey Kudinov
>            Assignee: Mark Miller
>             Fix For: 4.1, 5.0
>
>
>  setup:
> 1 node, 4 cores, 2 shards.
> 15 documents indexed.
> problem:
> init stage times out.
> probable cause:
> According to the init flow, cores are initialized one by one synchronously.
> Actually, the main thread waits 
> ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, 
> while replica cores are not yet initialized, in other words there is no 
> chance other replicas go up in the meanwhile.
> stack trace:
> Thread [main] (Suspended)
>         owns: HashMap<K,V>  (id=3876)
>         owns: StandardContext  (id=3877)
>         owns: HashMap<K,V>  (id=3878)
>         owns: StandardHost  (id=3879)
>         owns: StandardEngine  (id=3880)
>         owns: Service[]  (id=3881)
>         Thread.sleep(long) line: not available [native method]
>         ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String) 
> line: 298
>         ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
>         LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
>         LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
>         LeaderElector.joinElection(ElectionContext) line: 262
>         ZkController.joinElection(CoreDescriptor, boolean) line: 733
>         ZkController.register(String, CoreDescriptor, boolean, boolean) line: 
> 566
>         ZkController.register(String, CoreDescriptor) line: 532
>         CoreContainer.registerInZk(SolrCore) line: 709
>         CoreContainer.register(String, SolrCore, boolean) line: 693
>         CoreContainer.load(String, InputSource) line: 535
>         CoreContainer.load(String, File) line: 356
>         CoreContainer$Initializer.initialize() line: 308
>         SolrDispatchFilter.init(FilterConfig) line: 107
>         ApplicationFilterConfig.getFilter() line: 295
>         ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
>         ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
>         StandardContext.filterStart() line: 4072
>         StandardContext.start() line: 4726
>         StandardHost(ContainerBase).addChildInternal(Container) line: 799
>         StandardHost(ContainerBase).addChild(Container) line: 779
>         StandardHost.addChild(Container) line: 601
>         HostConfig.deployDescriptor(String, File, String) line: 675
>         HostConfig.deployDescriptors(File, String[]) line: 601
>         HostConfig.deployApps() line: 502
>         HostConfig.start() line: 1317
>         HostConfig.lifecycleEvent(LifecycleEvent) line: 324
>         LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
>         StandardHost(ContainerBase).start() line: 1065
>         StandardHost.start() line: 840
>         StandardEngine(ContainerBase).start() line: 1057
>         StandardEngine.start() line: 463
>         StandardService.start() line: 525
>         StandardServer.start() line: 754
>         Catalina.start() line: 595
>         NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not 
> available [native method]
>         NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
>         DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not 
> available
>         Method.invoke(Object, Object...) line: not available
>         Bootstrap.start() line: 289
>         Bootstrap.main(String[]) line: 414
>        
> After a while, the session times out and following exception appears:
> Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext 
> waitForReplicasToComeUp
> INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
> Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext 
> waitForReplicasToComeUp
> INFO: Was waiting for replicas to come up, but they are taking too long - 
> assuming they won't come back till later
> Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
> SEVERE: Errir checking for the number of election 
> participants:org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired for 
> /collections/collection1/leader_elect/shard2/election
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
>         at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>         at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
>         at 
> org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
>         at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
>         at 
> org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
>         at 
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
>         at 
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
>         at 
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
>         at 
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
>         at 
> org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
>         at 
> org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
>         at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
>         at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
>         at 
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
>         at 
> org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
>         at 
> org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
>         at 
> org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
>         at 
> org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
>         at 
> org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
>         at 
> org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
>         at 
> org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
>         at 
> org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
>         at 
> org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
>         at 
> org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
>         at 
> org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
>         at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
>         at 
> org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
>         at 
> org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
>         at 
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
>         at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
>         at 
> org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
>         at 
> org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
>         at 
> org.apache.catalina.core.StandardService.start(StandardService.java:525)
>         at 
> org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
>         at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>         at java.lang.reflect.Method.invoke(Unknown Source)
>         at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
>         at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
> Followed by:
> Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
> SEVERE: Recovery failed - trying again... core=collection1
> Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover. core=collection1
> Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
> SEVERE: Error while trying to recover. 
> core=collection1:org.apache.solr.common.SolrException: No registered leader 
> was found, collection:collection1 slice:shard1
>         at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
>         at 
> org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
>         at 
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
>         at 
> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to