[ 
https://issues.apache.org/jira/browse/SOLR-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15578736#comment-15578736
 ] 

Mikhail Khludnev commented on SOLR-9647:
----------------------------------------

Here are excerpts from the failure log tail.
{code}
 2> 90   INFO  
(SUITE-CollectionsAPIDistributedZkTest-seed#[355E7B68C1B5A5B6]-worker) [    ] 
o.a.s.SolrTestCaseJ4 Randomized ssl (true) and clientAuth (false) via: 
...
  2> 263082 INFO  (zkCallback-32-thread-2-processing-n:127.0.0.1:49743_) 
[n:127.0.0.1:49743_    ] o.a.s.c.Overseer Overseer 
(id=96767662755807251-127.0.0.1:49743_-n_0000000003) starting
  2> 263083 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) 
[n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] 
o.a.s.c.ShardLeaderElectionContextBase No version found for ephemeral leader 
parent node, won't remove previous leader registration.
  2> 263087 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) 
[n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] 
o.a.s.c.ActionThrottle The last leader attempt started 21ms ago.
  2> 263087 INFO  (zkCallback-39-thread-4-processing-n:127.0.0.1:49770_) 
[n:127.0.0.1:49770_ c:collection1 s:shard1 r:core_node4 x:collection1] 
o.a.s.c.ActionThrottle Throttling leader attempts - waiting for 4978ms
  2> 264298 ERROR (zkCallback-15-thread-2-EventThread) [    ] 
o.a.s.c.c.ZkStateReader Error reading cluster properties from zookeeper
  2> org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for /clusterprops.json
  2>    at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
...
{code}

{code}
268216 WARN  (Thread-1) [    ] o.a.s.c.ZkTestServer Watch limit violations: 
  2> Maximum concurrent create/delete watches above limit:
  2> 
  2>    12      /solr/aliases.json
  2>    5       /solr/security.json
  2>    5       /solr/configs/conf1
  2>    4       /solr/collections/collection1/state.json
  2> 
  2> Maximum concurrent data watches above limit:
  2> 
  2>    12      /solr/clusterstate.json
  2>    12      /solr/clusterprops.json
  2> 
  2> Maximum concurrent children watches above limit:
  2> 
  2>    109     /solr/overseer/collection-queue-work
  2>    39      /solr/overseer/queue
  2>    12      /solr/live_nodes
  2>    12      /solr/collections
  2>    11      /solr/overseer/queue-work
  2> 
{code}

I don't know the details but what "ActionThrottle Throttling leader attempts - 
waiting for 4978ms" is about? Is the test aware about such trotting? 
Even concurrent watches limits does/means nothing, isn't there are leak of 
watches? 

> CollectionsAPIDistributedZkTest got stuck, reproduces failure
> -------------------------------------------------------------
>
>                 Key: SOLR-9647
>                 URL: https://issues.apache.org/jira/browse/SOLR-9647
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Mikhail Khludnev
>
>  I have to shoot 
> https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1129/ just 
> because "Took 1 day 12 hr on lucene".
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:08:30, 
> stalled for 48990s at: CollectionsAPIDistributedZkTest.test
>    [junit4] HEARTBEAT J0 PID(30506@lucene1-us-west): 2016-10-15T00:09:30, 
> stalled for 49050s at: CollectionsAPIDistributedZkTest.test
>  It's just got stuck. Then I run it locally, it passes from Eclipse, but 
> fails when I run from cmd>ant. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to