[ 
https://issues.apache.org/jira/browse/SOLR-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531952#comment-17531952
 ] 

Michael Gibney commented on SOLR-16154:
---------------------------------------

idk; it's really hard to tell how much of a difference this makes, because it 
doesn't reproduce reliably. To make my uneasiness a little more concrete: I 
started looking at this after [~krisden] pointed me to a build that had the 
following thread leak detection stack trace: 

{code:sh}
    com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from 
SUITE scope at org.apache.solr.schema.TestBulkSchemaConcurrent: 
       1) Thread[id=20840, name=ZKEventListenerThread, state=TIMED_WAITING, 
group=TGRP-TestBulkSchemaConcurrent]
            at [email protected]/java.lang.Thread.sleep(Native Method)
            at 
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
            at 
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
            at 
app//org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:318)
            at 
app//org.apache.solr.cloud.ZkController.setConfWatcher(ZkController.java:2777)
            at 
app//org.apache.solr.cloud.ZkController.getConfDirListeners(ZkController.java:2699)
            at 
app//org.apache.solr.cloud.ZkController.registerConfListenerForCore(ZkController.java:2679)
            at 
app//org.apache.solr.core.SolrCore.registerConfListener(SolrCore.java:3345)
            at app//org.apache.solr.core.SolrCore.<init>(SolrCore.java:1183)
            at app//org.apache.solr.core.SolrCore.reload(SolrCore.java:780)
            at 
app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1857)
            at 
app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
            at 
app//org.apache.solr.core.SolrCore$$Lambda$926/0x0000000801430b18.run(Unknown 
Source)
            at 
app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2762)
            at 
app//org.apache.solr.cloud.ZkController$$Lambda$1401/0x0000000801768770.run(Unknown
 Source)
            at [email protected]/java.lang.Thread.run(Thread.java:833)
       2) Thread[id=20888, 
name=searcherExecutor-12894-thread-1-processing-127.0.0.1:36919__qxp%2Fs, 
state=WAITING, group=TGRP-TestBulkSchemaConcurrent]
            at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
            at 
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
            at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
            at 
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
            at 
[email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
            at 
app//org.apache.solr.core.SolrCore.lambda$new$2(SolrCore.java:1142)
            at 
app//org.apache.solr.core.SolrCore$$Lambda$435/0x00000008010f9538.call(Unknown 
Source)
            at 
[email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
            at 
app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:259)
            at 
app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$294/0x0000000800fd7c90.run(Unknown
 Source)
            at 
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
            at 
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
            at [email protected]/java.lang.Thread.run(Thread.java:833)
        at __randomizedtesting.SeedInfo.seed([7A0FDF16FA8DBCD5]:0)
{code}

The ZkEventListenerThread can run different tasks, so the stack traces vary. 
But in the one I pasted above, the concern is that once you're in 
ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161) and 
ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82), you're definitely in a 
retry loop that could last a very long time, and short of interrupting the 
running task, I don't see anything in that retry loop that shortcircuits based 
on closed status or anything like that.

I wish I had something more concrete to demonstrate how/whether this is 
actually problematic in practice, but unfortunately as of now I don't...

> ZKEventListenerThread leaks from tests
> --------------------------------------
>
>                 Key: SOLR-16154
>                 URL: https://issues.apache.org/jira/browse/SOLR-16154
>             Project: Solr
>          Issue Type: Test
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Major
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Seen repeatedly on Jenkins.
> {noformat}
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from 
> SUITE scope at 
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO: 
>    1) Thread[id=1089, name=ZKEventListenerThread, state=TIMED_WAITING, 
> group=TGRP-TestSchemaDesignerSettingsDAO]
>         at java.base@18/java.lang.Thread.sleep(Native Method)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
>         at 
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
>         at 
> app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:361)
>         at 
> app//org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:75)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:302)
>         at 
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:293)
>         at 
> app//org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory.inform(AbstractWordsFileFilterFactory.java:88)
>         at 
> app//org.apache.solr.core.SolrResourceLoader.informAware(SolrResourceLoader.java:762)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsInChain(ManagedIndexSchema.java:1470)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsForFieldType(ManagedIndexSchema.java:1319)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.postReadInform(ManagedIndexSchema.java:1307)
>         at 
> app//org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:654)
>         at 
> app//org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:188)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:119)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:279)
>         at 
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:51)
>         at 
> app//org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:342)
>         at 
> app//org.apache.solr.core.ConfigSetService.lambda$loadConfigSet$0(ConfigSetService.java:253)
>         at 
> app//org.apache.solr.core.ConfigSetService$$Lambda$632/0x0000000801137758.get(Unknown
>  Source)
>         at app//org.apache.solr.core.ConfigSet.<init>(ConfigSet.java:49)
>         at 
> app//org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:249)
>         at 
> app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1850)
>         at 
> app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
>         at 
> app//org.apache.solr.core.SolrCore$$Lambda$742/0x00000008011f2560.run(Unknown 
> Source)
>         at 
> app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2761)
>         at 
> app//org.apache.solr.cloud.ZkController$$Lambda$1153/0x00000008014e8938.run(Unknown
>  Source)
>         at java.base@18/java.lang.Thread.run(Thread.java:833)
>       at __randomizedtesting.SeedInfo.seed([DE9B93CA6D75B373]:0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to