[ 
https://issues.apache.org/jira/browse/SOLR-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532412#comment-17532412
 ] 

Kevin Risden edited comment on SOLR-16154 at 5/5/22 4:43 PM:
-------------------------------------------------------------

I would even go with a potentially more ugly option:

T1: FireEventListener thread starts
T2: Cluster shutdown happens, including ZK shutdown
*T3: FireEventListener thread starts - due to watchers firing the listeners 
during shutdown znode changes*
T1: Because ZK is shutdown, event listeners will loop in retry state until 
captured by ThreadLeak detector
T3: Because ZK is shutdown, event listeners will loop in retry state until 
captured by ThreadLeak detector

and now it is:

T1: FireEventListener submitted to cc's executor E
T2: Cluster shutdown happens, waits for E to terminate
*T3: FireEventListener submitted to cc's executor E  - due to watchers firing 
the listeners during shutdown znode changes - ignored by executor due to E 
already shutdown so no work done*
T1: Completes and terminates gracefully *(my statement about 60s wait is WRONG 
here - see below)*
T2: Shut down rest of cluster and ZK.

but yes agree with the above.

It's probably safe to add some asserts or some (debug?) logging about how long 
shutdown takes? ExecutorUtil would be a good place.

----

My statement about 60s was wrong - I didn't realize that ExecutorUtil 
awaitTermination LOOPS in the case of things not being shutdown yet.

https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L98

I was looking at this to check if shutdown/awaitTermination was actually 
interrupting threads after sometime. I was hoping it was waiting the 60 seconds 
and then doing a `shutdownNow` to actually interrupt the threads after waiting 
nicely for 60 seconds. It looks like ExecutorUtil just waits forever :(


was (Author: risdenk):
I would even go with a potentially more ugly option:

T1: FireEventListener thread starts
T2: Cluster shutdown happens, including ZK shutdown
*T1: FireEventListener thread starts - due to watchers firing the listeners 
during shutdown znode changes*
T1: Because ZK is shutdown, event listeners will loop in retry state until 
captured by ThreadLeak detector

and now it is:

T1: FireEventListener submitted to cc's executor E
T2: Cluster shutdown happens, waits for E to terminate
*T1: FireEventListener submitted to cc's executor E  - due to watchers firing 
the listeners during shutdown znode changes - ignored by executor due to E 
already shutdown so no work done*
T1: Completes and terminates gracefully *(my statement about 60s wait is WRONG 
here - see below)*
T2: Shut down rest of cluster and ZK.

but yes agree with the above.

It's probably safe to add some asserts or some (debug?) logging about how long 
shutdown takes? ExecutorUtil would be a good place.

----

My statement about 60s was wrong - I didn't realize that ExecutorUtil 
awaitTermination LOOPS in the case of things not being shutdown yet.

https://github.com/apache/solr/blob/main/solr/solrj/src/java/org/apache/solr/common/util/ExecutorUtil.java#L98

I was looking at this to check if shutdown/awaitTermination was actually 
interrupting threads after sometime. I was hoping it was waiting the 60 seconds 
and then doing a `shutdownNow` to actually interrupt the threads after waiting 
nicely for 60 seconds. It looks like ExecutorUtil just waits forever :(

> ZKEventListenerThread leaks from tests
> --------------------------------------
>
>                 Key: SOLR-16154
>                 URL: https://issues.apache.org/jira/browse/SOLR-16154
>             Project: Solr
>          Issue Type: Test
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seen repeatedly on Jenkins.
> {noformat}
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from 
> SUITE scope at 
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO: 
>    1) Thread[id=1089, name=ZKEventListenerThread, state=TIMED_WAITING, 
> group=TGRP-TestSchemaDesignerSettingsDAO]
>         at java.base@18/java.lang.Thread.sleep(Native Method)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
>         at 
> app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:361)
>         at 
> app//org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:75)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:302)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:293)
>         at 
> app//org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory.inform(AbstractWordsFileFilterFactory.java:88)
>         at 
> app//org.apache.solr.core.SolrResourceLoader.informAware(SolrResourceLoader.java:762)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsInChain(ManagedIndexSchema.java:1470)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsForFieldType(ManagedIndexSchema.java:1319)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.postReadInform(ManagedIndexSchema.java:1307)
>         at 
> app//org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:654)
>         at 
> app//org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:188)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:119)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:279)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:51)
>         at 
> app//org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:342)
>         at 
> app//org.apache.solr.core.ConfigSetService.lambda$loadConfigSet$0(ConfigSetService.java:253)
>         at 
> app//org.apache.solr.core.ConfigSetService$$Lambda$632/0x0000000801137758.get(Unknown
>  Source)
>         at app//org.apache.solr.core.ConfigSet.<init>(ConfigSet.java:49)
>         at 
> app//org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:249)
>         at 
> app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1850)
>         at 
> app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
>         at 
> app//org.apache.solr.core.SolrCore$$Lambda$742/0x00000008011f2560.run(Unknown 
> Source)
>         at 
> app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2761)
>         at 
> app//org.apache.solr.cloud.ZkController$$Lambda$1153/0x00000008014e8938.run(Unknown
>  Source)
>         at java.base@18/java.lang.Thread.run(Thread.java:833)
>       at __randomizedtesting.SeedInfo.seed([DE9B93CA6D75B373]:0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to