[
https://issues.apache.org/jira/browse/SOLR-16154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17531952#comment-17531952
]
Michael Gibney commented on SOLR-16154:
---------------------------------------
idk; it's really hard to tell how much of a difference this makes, because it
doesn't reproduce reliably. To make my uneasiness a little more concrete: I
started looking at this after [~krisden] pointed me to a build that had the
following thread leak detection stack trace:
{code:sh}
com.carrotsearch.randomizedtesting.ThreadLeakError: 2 threads leaked from
SUITE scope at org.apache.solr.schema.TestBulkSchemaConcurrent:
1) Thread[id=20840, name=ZKEventListenerThread, state=TIMED_WAITING,
group=TGRP-TestBulkSchemaConcurrent]
at [email protected]/java.lang.Thread.sleep(Native Method)
at
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
at
app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
at
app//org.apache.solr.common.cloud.SolrZkClient.exists(SolrZkClient.java:318)
at
app//org.apache.solr.cloud.ZkController.setConfWatcher(ZkController.java:2777)
at
app//org.apache.solr.cloud.ZkController.getConfDirListeners(ZkController.java:2699)
at
app//org.apache.solr.cloud.ZkController.registerConfListenerForCore(ZkController.java:2679)
at
app//org.apache.solr.core.SolrCore.registerConfListener(SolrCore.java:3345)
at app//org.apache.solr.core.SolrCore.<init>(SolrCore.java:1183)
at app//org.apache.solr.core.SolrCore.reload(SolrCore.java:780)
at
app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1857)
at
app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
at
app//org.apache.solr.core.SolrCore$$Lambda$926/0x0000000801430b18.run(Unknown
Source)
at
app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2762)
at
app//org.apache.solr.cloud.ZkController$$Lambda$1401/0x0000000801768770.run(Unknown
Source)
at [email protected]/java.lang.Thread.run(Thread.java:833)
2) Thread[id=20888,
name=searcherExecutor-12894-thread-1-processing-127.0.0.1:36919__qxp%2Fs,
state=WAITING, group=TGRP-TestBulkSchemaConcurrent]
at [email protected]/jdk.internal.misc.Unsafe.park(Native Method)
at
[email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211)
at
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:715)
at
[email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1047)
at
[email protected]/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:230)
at
app//org.apache.solr.core.SolrCore.lambda$new$2(SolrCore.java:1142)
at
app//org.apache.solr.core.SolrCore$$Lambda$435/0x00000008010f9538.call(Unknown
Source)
at
[email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:259)
at
app//org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$294/0x0000000800fd7c90.run(Unknown
Source)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at
[email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at [email protected]/java.lang.Thread.run(Thread.java:833)
at __randomizedtesting.SeedInfo.seed([7A0FDF16FA8DBCD5]:0)
{code}
The ZkEventListenerThread can run different tasks, so the stack traces vary.
But in the one I pasted above, the concern is that once you're in
ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161) and
ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82), you're definitely in a
retry loop that could last a very long time, and short of interrupting the
running task, I don't see anything in that retry loop that shortcircuits based
on closed status or anything like that.
I wish I had something more concrete to demonstrate how/whether this is
actually problematic in practice, but unfortunately as of now I don't...
> ZKEventListenerThread leaks from tests
> --------------------------------------
>
> Key: SOLR-16154
> URL: https://issues.apache.org/jira/browse/SOLR-16154
> Project: Solr
> Issue Type: Test
> Reporter: Mike Drob
> Assignee: Mike Drob
> Priority: Major
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Seen repeatedly on Jenkins.
> {noformat}
> com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from
> SUITE scope at
> org.apache.solr.handler.designer.TestSchemaDesignerSettingsDAO:
> 1) Thread[id=1089, name=ZKEventListenerThread, state=TIMED_WAITING,
> group=TGRP-TestSchemaDesignerSettingsDAO]
> at java.base@18/java.lang.Thread.sleep(Native Method)
> at
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryDelay(ZkCmdExecutor.java:161)
> at
> app//org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:82)
> at
> app//org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:361)
> at
> app//org.apache.solr.cloud.ZkSolrResourceLoader.openResource(ZkSolrResourceLoader.java:75)
> at
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getLines(AbstractAnalysisFactory.java:302)
> at
> app//org.apache.lucene.analysis.AbstractAnalysisFactory.getWordSet(AbstractAnalysisFactory.java:293)
> at
> app//org.apache.lucene.analysis.en.AbstractWordsFileFilterFactory.inform(AbstractWordsFileFilterFactory.java:88)
> at
> app//org.apache.solr.core.SolrResourceLoader.informAware(SolrResourceLoader.java:762)
> at
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsInChain(ManagedIndexSchema.java:1470)
> at
> app//org.apache.solr.schema.ManagedIndexSchema.informResourceLoaderAwareObjectsForFieldType(ManagedIndexSchema.java:1319)
> at
> app//org.apache.solr.schema.ManagedIndexSchema.postReadInform(ManagedIndexSchema.java:1307)
> at
> app//org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:654)
> at
> app//org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:188)
> at
> app//org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:119)
> at
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:279)
> at
> app//org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:51)
> at
> app//org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:342)
> at
> app//org.apache.solr.core.ConfigSetService.lambda$loadConfigSet$0(ConfigSetService.java:253)
> at
> app//org.apache.solr.core.ConfigSetService$$Lambda$632/0x0000000801137758.get(Unknown
> Source)
> at app//org.apache.solr.core.ConfigSet.<init>(ConfigSet.java:49)
> at
> app//org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:249)
> at
> app//org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1850)
> at
> app//org.apache.solr.core.SolrCore.lambda$getConfListener$21(SolrCore.java:3394)
> at
> app//org.apache.solr.core.SolrCore$$Lambda$742/0x00000008011f2560.run(Unknown
> Source)
> at
> app//org.apache.solr.cloud.ZkController.lambda$fireEventListeners$18(ZkController.java:2761)
> at
> app//org.apache.solr.cloud.ZkController$$Lambda$1153/0x00000008014e8938.run(Unknown
> Source)
> at java.base@18/java.lang.Thread.run(Thread.java:833)
> at __randomizedtesting.SeedInfo.seed([DE9B93CA6D75B373]:0)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]