[
https://issues.apache.org/jira/browse/SOLR-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438323#comment-16438323
]
Advertising
Mikhail Khludnev edited comment on SOLR-12200 at 4/14/18 7:33 PM:
------------------------------------------------------------------
attached [^SOLR-12200.patch]
# it breaks spin on /autoscaling on expiration, see "InterruptedException
handling between solr->zk interactions" mailthread
# -it adds a few probably redundant close()-
# the leak cause is fixed by introducing Overseer.closing it's just a proof,
probably it should be more ellegant
h2. Current leak scenario
* ZkController.close() call
* Overseer.close() interrupt threads, but not yet set the closed=true.
* ClusterStatusUpdater exits the loop, spawning the new thread to check the
ego-leadership (but I'd rather just clean interrupted flag)
https://github.com/apache/lucene-solr/blob/93f9a65b1c8aa460489fdce50ed84d18168b53ef/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L256
* but shutdown nor closing flag isn't seen there, and it invokes
{{zkController.rejoinOverseerElection(null, false);}} that leaks nearly closing
Overseer. Check the leaked overseer stacktrace to prove that.
It just a proof which makes {{the beast}} (really) happy. How to improve it
before going forward?
was (Author: mkhludnev):
attached [^SOLR-12200.patch]
# it breaks spin on /autoscaling on expiration, see "InterruptedException
handling between solr->zk interactions" mailthread
# it adds a few probably redundant close()
# the leak cause is fixed by introducing Overseer.closing it's just a proof,
probably it should be more ellegant
h2. Current leak scenario
* ZkController.close() call
* Overseer.close() interrupt threads, but not yet set the closed=true.
* ClusterStatusUpdater exits the loop, spawning the new thread to check the
ego-leadership (but I'd rather just clean interrupted flag)
https://github.com/apache/lucene-solr/blob/93f9a65b1c8aa460489fdce50ed84d18168b53ef/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L256
* but shutdown nor closing flag isn't seen there, and it invokes
{{zkController.rejoinOverseerElection(null, false);}} that leaks nearly closing
Overseer. Check the leaked overseer stacktrace to prove that.
It just a proof which makes {{the beast}} (really) happy. How to improve it
before going forward?
> ZkControllerTest failure. Leaking Overseer
> ------------------------------------------
>
> Key: SOLR-12200
> URL: https://issues.apache.org/jira/browse/SOLR-12200
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrCloud
> Reporter: Mikhail Khludnev
> Priority: Major
> Attachments: SOLR-12200.patch, tests-failures.txt,
> tests-failures.txt.gz, zk.fail.txt.gz
>
>
> Failure seems suspiciously the same.
> [junit4] 2> 499919 INFO
> (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77])
> [n:127.0.0.1:8983_solr ] o.a.s.c.Overseer Overseer
> (id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
> [junit4] 2> 499920 INFO
> (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [
> ] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:8983_solr
> [junit4] 2> 499920 ERROR
> (OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
> [ ] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
> [junit4] 2> java.lang.InterruptedException: null
> [junit4] 2> at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
> [junit4] 2> at java.lang.Object.wait(Object.java:502)
> ~[?:1.8.0_152]
> [junit4] 2> at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409)
> ~[zookeeper-3.4.11.jar:3.4
> then it spins in SessionExpiredException, all tests pass but suite fails due
> to leaking Overseer.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org