Alexey Kudinov created SOLR-3993:
------------------------------------
Summary: SolrCloud leader election on single node stucks the
initialization
Key: SOLR-3993
URL: https://issues.apache.org/jira/browse/SOLR-3993
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.0
Environment: Windows 7, Tomcat 6
Reporter: Alexey Kudinov
setup:
1 node, 4 cores, 2 shards.
15 documents indexed.
problem:
init stage times out.
probable cause:
According to the init flow, cores are initialized one by one synchronously.
Actually, the main thread waits
ShardLeaderElectionContext.waitForReplicasToComeUp until retry threshold, while
replica cores are not yet initialized, in other words there is no chance other
replicas go up in the meanwhile.
stack trace:
Thread [main] (Suspended)
owns: HashMap<K,V> (id=3876)
owns: StandardContext (id=3877)
owns: HashMap<K,V> (id=3878)
owns: StandardHost (id=3879)
owns: StandardEngine (id=3880)
owns: Service[] (id=3881)
Thread.sleep(long) line: not available [native method]
ShardLeaderElectionContext.waitForReplicasToComeUp(boolean, String)
line: 298
ShardLeaderElectionContext.runLeaderProcess(boolean) line: 143
LeaderElector.runIamLeaderProcess(ElectionContext, boolean) line: 152
LeaderElector.checkIfIamLeader(int, ElectionContext, boolean) line: 96
LeaderElector.joinElection(ElectionContext) line: 262
ZkController.joinElection(CoreDescriptor, boolean) line: 733
ZkController.register(String, CoreDescriptor, boolean, boolean) line:
566
ZkController.register(String, CoreDescriptor) line: 532
CoreContainer.registerInZk(SolrCore) line: 709
CoreContainer.register(String, SolrCore, boolean) line: 693
CoreContainer.load(String, InputSource) line: 535
CoreContainer.load(String, File) line: 356
CoreContainer$Initializer.initialize() line: 308
SolrDispatchFilter.init(FilterConfig) line: 107
ApplicationFilterConfig.getFilter() line: 295
ApplicationFilterConfig.setFilterDef(FilterDef) line: 422
ApplicationFilterConfig.<init>(Context, FilterDef) line: 115
StandardContext.filterStart() line: 4072
StandardContext.start() line: 4726
StandardHost(ContainerBase).addChildInternal(Container) line: 799
StandardHost(ContainerBase).addChild(Container) line: 779
StandardHost.addChild(Container) line: 601
HostConfig.deployDescriptor(String, File, String) line: 675
HostConfig.deployDescriptors(File, String[]) line: 601
HostConfig.deployApps() line: 502
HostConfig.start() line: 1317
HostConfig.lifecycleEvent(LifecycleEvent) line: 324
LifecycleSupport.fireLifecycleEvent(String, Object) line: 142
StandardHost(ContainerBase).start() line: 1065
StandardHost.start() line: 840
StandardEngine(ContainerBase).start() line: 1057
StandardEngine.start() line: 463
StandardService.start() line: 525
StandardServer.start() line: 754
Catalina.start() line: 595
NativeMethodAccessorImpl.invoke0(Method, Object, Object[]) line: not
available [native method]
NativeMethodAccessorImpl.invoke(Object, Object[]) line: not available
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: not
available
Method.invoke(Object, Object...) line: not available
Bootstrap.start() line: 289
Bootstrap.main(String[]) line: 414
After a while, the session times out and following exception appears:
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=0 timeoutin=-95
Oct 25, 2012 1:16:56 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Was waiting for replicas to come up, but they are taking too long -
assuming they won't come back till later
Oct 25, 2012 1:16:56 PM org.apache.solr.common.SolrException log
SEVERE: Errir checking for the number of election
participants:org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for
/collections/collection1/leader_elect/shard2/election
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:227)
at
org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:224)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:224)
at
org.apache.solr.cloud.ShardLeaderElectionContext.waitForReplicasToComeUp(ElectionContext.java:276)
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:143)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:152)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:96)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:262)
at
org.apache.solr.cloud.ZkController.joinElection(ZkController.java:733)
at org.apache.solr.cloud.ZkController.register(ZkController.java:566)
at org.apache.solr.cloud.ZkController.register(ZkController.java:532)
at
org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:709)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:693)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:535)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:356)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:308)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:107)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:675)
at
org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:601)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:502)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at
org.apache.catalina.core.StandardService.start(StandardService.java:525)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Followed by:
Oct 25, 2012 1:17:27 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. core=collection1
Oct 25, 2012 1:18:32 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=collection1:org.apache.solr.common.SolrException: No registered leader was
found, collection:collection1 slice:shard1
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:413)
at
org.apache.solr.common.cloud.ZkStateReader.getLeaderProps(ZkStateReader.java:399)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:318)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]