[
https://issues.apache.org/jira/browse/SOLR-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428043#comment-16428043
]
Mikhail Khludnev edited comment on SOLR-7736 at 4/14/18 10:42 AM:
------------------------------------------------------------------
attaching the excerpt from
[https://builds.apache.org/job/PreCommit-SOLR-Build/39/console]
[^ZkController.failure.txt]
It goes like this:
{quote}
[junit4] 2> 499911 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at 127.0.0.1:40606/solr
ready
[junit4] 2> 499916 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.ZkController Register node as live in
ZooKeeper:/live_nodes/127.0.0.1:8983_solr
[junit4] 2> 499919 INFO
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ]
o.a.s.c.c.Z
kStateReader Updated live nodes from ZooKeeper... (0) -> (1)
[junit4] 2> 499919 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.Overseer Overseer
(id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
[junit4] 2> 499920 INFO
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ]
o.a.s.c.Ove
rseer Overseer Loop exiting : 127.0.0.1:8983_solr
[junit4] 2> 499920 ERROR
(OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
[
] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
[junit4] 2> java.lang.InterruptedException: null
[junit4] 2> at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
[junit4] 2> at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152]
[junit4] 2> at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409)
~[zookeeper-3.4.11.jar:3.4
.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1100)
~[zookeeper-3.4.11.jar:3.4.11-37e27
7162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:316)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316)
~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.OverseerNodePrioritizer.prioritizeOverseerNodes(OverseerNodePrioritizer.
java:60) ~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:178)
[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
[junit4] 2> 499934 WARN (OverseerExitThread) [ ] o.a.s.c.Overseer I'm
exiting, but I'm still the leader
[junit4] 2> 499939 INFO (OverseerExitThread) [ ]
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:8983_solr
[junit4] 2> 499940 INFO (OverseerExitThread) [ ] o.a.s.c.Overseer
Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000001) starting
[junit4] 2> 499948 ERROR
(OverseerAutoScalingTriggerThread-73578760132362243-127.0.0.1:8983_solr-n_0000000001)
[ ] o.a.s.c.a.OverseerTriggerThread A ZK error has occurred
[junit4] 2> java.io.IOException:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /autoscaling.json
[junit4] 2> at
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateManager.java:183)
~[java/:?]
[junit4] 2> at
org.apache.solr.client.solrj.cloud.DistribStateManager.getAutoScalingConfig(DistribStateManager.java:83)
~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.autoscaling.OverseerTriggerThread.run(OverseerTriggerThread.java:131)
[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
[junit4] 2> Caused by:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /autoscaling.json
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340)
~[java/:?]
[junit4] 2> at
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateMan
{quote}
then it start spining in session expired. At the end, the leak of the Overseer
is detected and OverseerAutoScalingTriggerThread.
I have two questions: may it happen that the first "exiting" Overseer leaks?
-Can't OverseerAutoScalingTriggerThread restore the expired session?- *UPD* No,
ZkController is responsible for reconnect and restart.
followup SOLR-12200
was (Author: mkhludnev):
attaching the excerpt from
[https://builds.apache.org/job/PreCommit-SOLR-Build/39/console]
[^ZkController.failure.txt]
It goes like this:
{quote}
[junit4] 2> 499911 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at 127.0.0.1:40606/solr
ready
[junit4] 2> 499916 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.ZkController Register node as live in
ZooKeeper:/live_nodes/127.0.0.1:8983_solr
[junit4] 2> 499919 INFO
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ]
o.a.s.c.c.Z
kStateReader Updated live nodes from ZooKeeper... (0) -> (1)
[junit4] 2> 499919 INFO
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
[n:127.0.0.1:8983_solr
] o.a.s.c.Overseer Overseer
(id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
[junit4] 2> 499920 INFO
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [ ]
o.a.s.c.Ove
rseer Overseer Loop exiting : 127.0.0.1:8983_solr
[junit4] 2> 499920 ERROR
(OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
[
] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
[junit4] 2> java.lang.InterruptedException: null
[junit4] 2> at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
[junit4] 2> at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152]
[junit4] 2> at
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409)
~[zookeeper-3.4.11.jar:3.4
.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1100)
~[zookeeper-3.4.11.jar:3.4.11-37e27
7162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:316)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316)
~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.OverseerNodePrioritizer.prioritizeOverseerNodes(OverseerNodePrioritizer.
java:60) ~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:178)
[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
[junit4] 2> 499934 WARN (OverseerExitThread) [ ] o.a.s.c.Overseer I'm
exiting, but I'm still the leader
[junit4] 2> 499939 INFO (OverseerExitThread) [ ]
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:8983_solr
[junit4] 2> 499940 INFO (OverseerExitThread) [ ] o.a.s.c.Overseer
Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000001) starting
[junit4] 2> 499948 ERROR
(OverseerAutoScalingTriggerThread-73578760132362243-127.0.0.1:8983_solr-n_0000000001)
[ ] o.a.s.c.a.OverseerTriggerThread A ZK error has occurred
[junit4] 2> java.io.IOException:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /autoscaling.json
[junit4] 2> at
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateManager.java:183)
~[java/:?]
[junit4] 2> at
org.apache.solr.client.solrj.cloud.DistribStateManager.getAutoScalingConfig(DistribStateManager.java:83)
~[java/:?]
[junit4] 2> at
org.apache.solr.cloud.autoscaling.OverseerTriggerThread.run(OverseerTriggerThread.java:131)
[java/:?]
[junit4] 2> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
[junit4] 2> Caused by:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /autoscaling.json
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215)
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
~[java/:?]
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340)
~[java/:?]
[junit4] 2> at
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateMan
{quote}
then it start spining in session expired. At the end, the leak of the Overseer
is detected and OverseerAutoScalingTriggerThread.
I have two questions: may it happen that the first "exiting" Overseer leaks?
Can't OverseerAutoScalingTriggerThread restore the expired session?
> Add a test for ZkController.publishAndWaitForDownStates
> -------------------------------------------------------
>
> Key: SOLR-7736
> URL: https://issues.apache.org/jira/browse/SOLR-7736
> Project: Solr
> Issue Type: Test
> Components: SolrCloud, Tests
> Reporter: Shalin Shekhar Mangar
> Assignee: Shalin Shekhar Mangar
> Priority: Minor
> Fix For: 7.4, master (8.0)
>
> Attachments: SOLR-7736.patch, SOLR-7736.patch,
> ZkController.failure.txt, consoleFull-2462-ZkControllerTest.txt.gz
>
>
> Add a test for ZkController.publishAndWaitForDownStates so that bugs like
> SOLR-6665 do not occur again. A test exists but it is not correct and
> currently disabled via AwaitsFix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]