[ https://issues.apache.org/jira/browse/SOLR-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428043#comment-16428043 ]
Mikhail Khludnev commented on SOLR-7736: ---------------------------------------- attaching the excerpt from [https://builds.apache.org/job/PreCommit-SOLR-Build/39/console] [^ZkController.failure.txt] It goes like this: {quote} [junit4] 2> 499911 INFO (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) [n:127.0.0.1:8983_solr ] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at 127.0.0.1:40606/solr ready [junit4] 2> 499916 INFO (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) [n:127.0.0.1:8983_solr ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/127.0.0.1:8983_solr [junit4] 2> 499919 INFO (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ] o.a.s.c.c.Z kStateReader Updated live nodes from ZooKeeper... (0) -> (1) [junit4] 2> 499919 INFO (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) [n:127.0.0.1:8983_solr ] o.a.s.c.Overseer Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing [junit4] 2> 499920 INFO (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ] o.a.s.c.Ove rseer Overseer Loop exiting : 127.0.0.1:8983_solr [junit4] 2> 499920 ERROR (OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer [junit4] 2> java.lang.InterruptedException: null [junit4] 2> at java.lang.Object.wait(Native Method) ~[?:1.8.0_152] [junit4] 2> at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152] [junit4] 2> at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) ~[zookeeper-3.4.11.jar:3.4 .11-37e277162d567b55a07d1755f0b31c32e93c01a0] [junit4] 2> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1100) ~[zookeeper-3.4.11.jar:3.4.11-37e27 7162d567b55a07d1755f0b31c32e93c01a0] [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:316) ~[java/:?] [junit4] 2> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) ~[java/:?] [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316) ~[java/:?] [junit4] 2> at org.apache.solr.cloud.OverseerNodePrioritizer.prioritizeOverseerNodes(OverseerNodePrioritizer. java:60) ~[java/:?] [junit4] 2> at org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:178) [java/:?] [junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152] [junit4] 2> 499934 WARN (OverseerExitThread) [ ] o.a.s.c.Overseer I'm exiting, but I'm still the leader [junit4] 2> 499939 INFO (OverseerExitThread) [ ] o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:8983_solr [junit4] 2> 499940 INFO (OverseerExitThread) [ ] o.a.s.c.Overseer Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000001) starting [junit4] 2> 499948 ERROR (OverseerAutoScalingTriggerThread-73578760132362243-127.0.0.1:8983_solr-n_0000000001) [ ] o.a.s.c.a.OverseerTriggerThread A ZK error has occurred [junit4] 2> java.io.IOException: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /autoscaling.json [junit4] 2> at org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateManager.java:183) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.cloud.DistribStateManager.getAutoScalingConfig(DistribStateManager.java:83) ~[java/:?] [junit4] 2> at org.apache.solr.cloud.autoscaling.OverseerTriggerThread.run(OverseerTriggerThread.java:131) [java/:?] [junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152] [junit4] 2> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /autoscaling.json [junit4] 2> at org.apache.zookeeper.KeeperException.create(KeeperException.java:130) ~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0] [junit4] 2> at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0] [junit4] 2> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215) ~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0] [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340) ~[java/:?] [junit4] 2> at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) ~[java/:?] [junit4] 2> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340) ~[java/:?] [junit4] 2> at org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateMan {quote} then it start spining in session expired. At the end, the leak of the Overseer is detected and OverseerAutoScalingTriggerThread. I have two questions: may it happen that the first "exiting" Overseer leaks? Can't OverseerAutoScalingTriggerThread restore the expired session? > Add a test for ZkController.publishAndWaitForDownStates > ------------------------------------------------------- > > Key: SOLR-7736 > URL: https://issues.apache.org/jira/browse/SOLR-7736 > Project: Solr > Issue Type: Test > Components: SolrCloud, Tests > Reporter: Shalin Shekhar Mangar > Assignee: Shalin Shekhar Mangar > Priority: Minor > Fix For: 7.4, master (8.0) > > Attachments: SOLR-7736.patch, SOLR-7736.patch, > ZkController.failure.txt, consoleFull-2462-ZkControllerTest.txt.gz > > > Add a test for ZkController.publishAndWaitForDownStates so that bugs like > SOLR-6665 do not occur again. A test exists but it is not correct and > currently disabled via AwaitsFix. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org