[ 
https://issues.apache.org/jira/browse/SOLR-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428043#comment-16428043
 ] 

Mikhail Khludnev edited comment on SOLR-7736 at 4/14/18 10:42 AM:
------------------------------------------------------------------

attaching the excerpt from 
[https://builds.apache.org/job/PreCommit-SOLR-Build/39/console] 
[^ZkController.failure.txt]
It goes like this: 
{quote}
   [junit4]   2> 499911 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at 127.0.0.1:40606/solr 
ready
   [junit4]   2> 499916 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.ZkController Register node as live in 
ZooKeeper:/live_nodes/127.0.0.1:8983_solr
   [junit4]   2> 499919 INFO  
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [    ] 
o.a.s.c.c.Z
kStateReader Updated live nodes from ZooKeeper... (0) -> (1)
   [junit4]   2> 499919 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.Overseer Overseer 
(id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
   [junit4]   2> 499920 INFO  
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [    ] 
o.a.s.c.Ove
rseer Overseer Loop exiting : 127.0.0.1:8983_solr
   [junit4]   2> 499920 ERROR 
(OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
 [
    ] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
   [junit4]   2> java.lang.InterruptedException: null
   [junit4]   2>        at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
   [junit4]   2>        at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152]
   [junit4]   2>        at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
~[zookeeper-3.4.11.jar:3.4
.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1100) 
~[zookeeper-3.4.11.jar:3.4.11-37e27
7162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:316)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.OverseerNodePrioritizer.prioritizeOverseerNodes(OverseerNodePrioritizer.
java:60) ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:178) 
[java/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
   [junit4]   2> 499934 WARN  (OverseerExitThread) [    ] o.a.s.c.Overseer I'm 
exiting, but I'm still the leader
   [junit4]   2> 499939 INFO  (OverseerExitThread) [    ] 
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:8983_solr
   [junit4]   2> 499940 INFO  (OverseerExitThread) [    ] o.a.s.c.Overseer 
Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000001) starting
   [junit4]   2> 499948 ERROR 
(OverseerAutoScalingTriggerThread-73578760132362243-127.0.0.1:8983_solr-n_0000000001)
 [    ] o.a.s.c.a.OverseerTriggerThread A ZK error has occurred
   [junit4]   2> java.io.IOException: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /autoscaling.json
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateManager.java:183)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.cloud.DistribStateManager.getAutoScalingConfig(DistribStateManager.java:83)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.autoscaling.OverseerTriggerThread.run(OverseerTriggerThread.java:131)
 [java/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
   [junit4]   2> Caused by: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /autoscaling.json
   [junit4]   2>        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:130) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateMan
{quote}
then it start spining in session expired. At the end, the leak of the Overseer 
is detected and OverseerAutoScalingTriggerThread.
I have two questions: may it happen that the first "exiting" Overseer leaks? 
-Can't OverseerAutoScalingTriggerThread restore the expired session?- *UPD* No, 
ZkController is responsible for reconnect and restart. 

followup SOLR-12200
 


was (Author: mkhludnev):
attaching the excerpt from 
[https://builds.apache.org/job/PreCommit-SOLR-Build/39/console] 
[^ZkController.failure.txt]
It goes like this: 
{quote}
   [junit4]   2> 499911 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.s.i.ZkClientClusterStateProvider Cluster at 127.0.0.1:40606/solr 
ready
   [junit4]   2> 499916 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.ZkController Register node as live in 
ZooKeeper:/live_nodes/127.0.0.1:8983_solr
   [junit4]   2> 499919 INFO  
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [    ] 
o.a.s.c.c.Z
kStateReader Updated live nodes from ZooKeeper... (0) -> (1)
   [junit4]   2> 499919 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr
  ] o.a.s.c.Overseer Overseer 
(id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
   [junit4]   2> 499920 INFO  
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [    ] 
o.a.s.c.Ove
rseer Overseer Loop exiting : 127.0.0.1:8983_solr
   [junit4]   2> 499920 ERROR 
(OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
 [
    ] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
   [junit4]   2> java.lang.InterruptedException: null
   [junit4]   2>        at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
   [junit4]   2>        at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152]
   [junit4]   2>        at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
~[zookeeper-3.4.11.jar:3.4
.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1100) 
~[zookeeper-3.4.11.jar:3.4.11-37e27
7162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.lambda$exists$3(SolrZkClient.java:316)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:316) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.OverseerNodePrioritizer.prioritizeOverseerNodes(OverseerNodePrioritizer.
java:60) ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:178) 
[java/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
   [junit4]   2> 499934 WARN  (OverseerExitThread) [    ] o.a.s.c.Overseer I'm 
exiting, but I'm still the leader
   [junit4]   2> 499939 INFO  (OverseerExitThread) [    ] 
o.a.s.c.OverseerElectionContext I am going to be the leader 127.0.0.1:8983_solr
   [junit4]   2> 499940 INFO  (OverseerExitThread) [    ] o.a.s.c.Overseer 
Overseer (id=73578760132362243-127.0.0.1:8983_solr-n_0000000001) starting
   [junit4]   2> 499948 ERROR 
(OverseerAutoScalingTriggerThread-73578760132362243-127.0.0.1:8983_solr-n_0000000001)
 [    ] o.a.s.c.a.OverseerTriggerThread A ZK error has occurred
   [junit4]   2> java.io.IOException: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /autoscaling.json
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateManager.java:183)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.cloud.DistribStateManager.getAutoScalingConfig(DistribStateManager.java:83)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.cloud.autoscaling.OverseerTriggerThread.run(OverseerTriggerThread.java:131)
 [java/:?]
   [junit4]   2>        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_152]
   [junit4]   2> Caused by: 
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /autoscaling.json
   [junit4]   2>        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:130) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215) 
~[zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.lambda$getData$5(SolrZkClient.java:340)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
 ~[java/:?]
   [junit4]   2>        at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:340) 
~[java/:?]
   [junit4]   2>        at 
org.apache.solr.client.solrj.impl.ZkDistribStateManager.getAutoScalingConfig(ZkDistribStateMan
{quote}
then it start spining in session expired. At the end, the leak of the Overseer 
is detected and OverseerAutoScalingTriggerThread.
I have two questions: may it happen that the first "exiting" Overseer leaks? 
Can't OverseerAutoScalingTriggerThread restore the expired session? 
 

> Add a test for ZkController.publishAndWaitForDownStates
> -------------------------------------------------------
>
>                 Key: SOLR-7736
>                 URL: https://issues.apache.org/jira/browse/SOLR-7736
>             Project: Solr
>          Issue Type: Test
>          Components: SolrCloud, Tests
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>            Priority: Minor
>             Fix For: 7.4, master (8.0)
>
>         Attachments: SOLR-7736.patch, SOLR-7736.patch, 
> ZkController.failure.txt, consoleFull-2462-ZkControllerTest.txt.gz
>
>
> Add a test for ZkController.publishAndWaitForDownStates so that bugs like 
> SOLR-6665 do not occur again. A test exists but it is not correct and 
> currently disabled via AwaitsFix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to