[
https://issues.apache.org/jira/browse/SOLR-10420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Rowe updated SOLR-10420:
------------------------------
Attachment: OverseerTest.80.stdout
I ran all Solr tests with the patch on master, and one test failed:
{noformat}
[junit4] 2> 264992 ERROR (OverseerExitThread) [ ] o.a.s.c.Overseer
could not read the data
[junit4] 2> org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /overseer_elect/leader
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
[junit4] 2> at
org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
[junit4] 2> at
org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
[junit4] 2> at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
[junit4] 2> at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
[junit4] 2> at
org.apache.solr.cloud.Overseer$ClusterStateUpdater.checkIfIamStillLeader(Overseer.java:290)
[junit4] 2> at java.lang.Thread.run(Thread.java:745)
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=OverseerTest
-Dtests.method=testExternalClusterStateChangeBehavior
-Dtests.seed=2110CE0AEF674CFA -Dtests.slow=true -Dtests.locale=es-GT
-Dtests.timezone=Asia/Kolkata -Dtests.asserts=true -Dtests.file.encoding=UTF-8
[junit4] FAILURE 5.46s J12 |
OverseerTest.testExternalClusterStateChangeBehavior <<<
[junit4] > Throwable #1: java.lang.AssertionError: Illegal state, was:
down expected:active clusterState:live
nodes:[]collections:{c1=DocCollection(c1//clusterstate.json/2)={
[junit4] > "shards":{"shard1":{
[junit4] > "parent":null,
[junit4] > "range":null,
[junit4] > "state":"active",
[junit4] > "replicas":{"core_node1":{
[junit4] > "base_url":"http://127.0.0.1/solr",
[junit4] > "node_name":"node1",
[junit4] > "core":"core1",
[junit4] > "roles":"",
[junit4] > "state":"down"}}}},
[junit4] > "router":{"name":"implicit"}}, test=LazyCollectionRef(test)}
[junit4] > at
__randomizedtesting.SeedInfo.seed([2110CE0AEF674CFA:490ECDE60DF716B4]:0)
[junit4] > at
org.apache.solr.cloud.AbstractDistribZkTestBase.verifyReplicaStatus(AbstractDistribZkTestBase.java:273)
[junit4] > at
org.apache.solr.cloud.OverseerTest.testExternalClusterStateChangeBehavior(OverseerTest.java:1259)
{noformat}
I ran the repro line a couple of times and it didn't reproduce. I then beasted
100 iterations of the test suite using Miller's beasting script, and it failed
once. I'm attaching the test log from the failure.
Looking at emailed Jenkins reports of
{{testExternalClusterStateChangeBehavior()}} failing, I see that it was failing
almost daily until the day SOLR-9191 was committed (June 9, 2016), and then
zero failures since, so this failure seems suspicious to me, since this issue
is related to SOLR-9191.
I beasted 200 iterations of OverseerTest without the patch, and got zero
failures.
> Solr 6.x leaking one SolrZkClient instance per second
> -----------------------------------------------------
>
> Key: SOLR-10420
> URL: https://issues.apache.org/jira/browse/SOLR-10420
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 6.5, 6.4.2
> Reporter: Markus Jelsma
> Fix For: master (7.0), branch_6x
>
> Attachments: OverseerTest.80.stdout, SOLR-10420.patch
>
>
> One of our nodes became berzerk after a restart, Solr went completely nuts!
> So i opened VisualVM to keep an eye on it and spotted a different problem
> that occurs in all our Solr 6.4.2 and 6.5.0 nodes.
> It appears Solr is leaking one SolrZkClient instance per second via
> DistributedQueue$ChildWatcher. That one per second is quite accurate for all
> nodes, there are about the same amount of instances as there are seconds
> since Solr started. I know VisualVM's instance count includes
> objects-to-be-collected, the instance count does not drop after a forced
> garbed collection round.
> It doesn't matter how many cores or collections the nodes carry or how heavy
> traffic is.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]