[ 
https://issues.apache.org/jira/browse/SOLR-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438323#comment-16438323
 ] 

Mikhail Khludnev edited comment on SOLR-12200 at 4/14/18 7:33 PM:
------------------------------------------------------------------

attached [^SOLR-12200.patch]
# it breaks spin on /autoscaling on expiration, see "InterruptedException 
handling between solr->zk interactions" mailthread
# -it adds a few probably redundant close()-
# the leak cause is fixed by introducing Overseer.closing it's just a proof, 
probably it should be more ellegant

h2. Current leak scenario 
* ZkController.close() call
* Overseer.close() interrupt threads, but not yet set the closed=true.
* ClusterStatusUpdater exits the loop, spawning the new thread to check the 
ego-leadership (but I'd rather just clean interrupted flag) 
https://github.com/apache/lucene-solr/blob/93f9a65b1c8aa460489fdce50ed84d18168b53ef/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L256
 
* but shutdown nor closing flag isn't seen there, and it invokes 
{{zkController.rejoinOverseerElection(null, false);}} that leaks nearly closing 
Overseer. Check the leaked overseer stacktrace to prove that.

It just a proof which makes {{the beast}} (really) happy. How to improve it 
before going forward?   


was (Author: mkhludnev):
attached [^SOLR-12200.patch]
# it breaks spin on /autoscaling on expiration, see "InterruptedException 
handling between solr->zk interactions" mailthread
# it adds a few probably redundant close()
# the leak cause is fixed by introducing Overseer.closing it's just a proof, 
probably it should be more ellegant

h2. Current leak scenario 
* ZkController.close() call
* Overseer.close() interrupt threads, but not yet set the closed=true.
* ClusterStatusUpdater exits the loop, spawning the new thread to check the 
ego-leadership (but I'd rather just clean interrupted flag) 
https://github.com/apache/lucene-solr/blob/93f9a65b1c8aa460489fdce50ed84d18168b53ef/solr/core/src/java/org/apache/solr/cloud/Overseer.java#L256
 
* but shutdown nor closing flag isn't seen there, and it invokes 
{{zkController.rejoinOverseerElection(null, false);}} that leaks nearly closing 
Overseer. Check the leaked overseer stacktrace to prove that.

It just a proof which makes {{the beast}} (really) happy. How to improve it 
before going forward?   

> ZkControllerTest failure. Leaking Overseer
> ------------------------------------------
>
>                 Key: SOLR-12200
>                 URL: https://issues.apache.org/jira/browse/SOLR-12200
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Mikhail Khludnev
>            Priority: Major
>         Attachments: SOLR-12200.patch, tests-failures.txt, 
> tests-failures.txt.gz, zk.fail.txt.gz
>
>
> Failure seems suspiciously the same. 
>    [junit4]   2> 499919 INFO  
> (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
> [n:127.0.0.1:8983_solr    ] o.a.s.c.Overseer Overseer 
> (id=73578760132362243-127.0.0.1:8983_solr-n_0000000000) closing
>    [junit4]   2> 499920 INFO  
> (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_0000000000) [    
> ] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:8983_solr
>    [junit4]   2> 499920 ERROR 
> (OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_0000000000)
>  [    ] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
>    [junit4]   2> java.lang.InterruptedException: null
>    [junit4]   2>        at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
>    [junit4]   2>        at java.lang.Object.wait(Object.java:502) 
> ~[?:1.8.0_152]
>    [junit4]   2>        at 
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
> ~[zookeeper-3.4.11.jar:3.4
> then it spins in SessionExpiredException, all tests pass but suite fails due 
> to leaking Overseer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to