[
https://issues.apache.org/jira/browse/SOLR-13678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-13678:
----------------------------
Attachment: collectionpropswatcher-deadlock-jstack.txt
Status: Open (was: Open)
attaching the full jstack output that i captured from observing this during a
run of {{CollectionPropsTest.testReadWriteCached}} (ie: the source of the
snippet included in the summary)
Please note that i captured this threaddump while in the process of testing
some unrelated changes to other methods in {{CollectionPropsTest}} -- i believe
all of my local changes to that test class at the time this thread dump was
captured were to code that appeared farther down in the test file then any line
numbers that might be mentioned in this threaddump, so all line numbers should
be accurate on master circa ~ 52b5ec8068, but i'm not 100% certain. the key
thing to focus on is the line numbers and callstack for the non-test code ....
i am 100% certain i had no local changes to the
{{CollectionPropsTest.testReadWriteCached}}, or any non-test code.
> ZkStateReader.removeCollectionPropsWatcher can deadlock with concurrent
> zkCallback thread on props watcher
> ----------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13678
> URL: https://issues.apache.org/jira/browse/SOLR-13678
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Hoss Man
> Priority: Major
> Attachments: collectionpropswatcher-deadlock-jstack.txt
>
>
> while investigating an (unrelated) test bug in CollectionPropsTest I
> discovered a deadlock situation that can occur when calling
> {{ZkStateReader.removeCollectionPropsWatcher()}} if a zkCallback thread tries
> to concurrently fire the watchers set on the collection props.
> {{ZkStateReader.removeCollectionPropsWatcher()}} is itself called when a
> {{CollectionPropsWatcher.onStateChanged()}} impl returns "true" -- meaning
> that IIUC any usage of {{CollectionPropsWatcher}} could potentially result in
> this type of deadlock situation.
> {noformat}
> "TEST-CollectionPropsTest.testReadWriteCached-seed#[D3C6921874D1CFEB]" #15
> prio=5 os_prio=0 cpu=567.78ms elapsed=682.12s tid=0x00007
> fa5e8343800 nid=0x3f61 waiting for monitor entry [0x00007fa62d222000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.solr.common.cloud.ZkStateReader.lambda$removeCollectionPropsWatcher$20(ZkStateReader.java:2001)
> - waiting to lock <0x00000000e6207500> (a
> java.util.concurrent.ConcurrentHashMap)
> at
> org.apache.solr.common.cloud.ZkStateReader$$Lambda$617/0x00000001006c1840.apply(Unknown
> Source)
> at
> java.util.concurrent.ConcurrentHashMap.compute([email protected]/ConcurrentHashMap.java:1932)
> - locked <0x00000000eb9156b8> (a
> java.util.concurrent.ConcurrentHashMap$Node)
> at
> org.apache.solr.common.cloud.ZkStateReader.removeCollectionPropsWatcher(ZkStateReader.java:1994)
> at
> org.apache.solr.cloud.CollectionPropsTest.testReadWriteCached(CollectionPropsTest.java:125)
> ...
> "zkCallback-88-thread-2" #213 prio=5 os_prio=0 cpu=14.06ms elapsed=672.65s
> tid=0x00007fa6041bf000 nid=0x402f waiting for monitor ent
> ry [0x00007fa5b8f39000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> java.util.concurrent.ConcurrentHashMap.compute([email protected]/ConcurrentHashMap.java:1923)
> - waiting to lock <0x00000000eb9156b8> (a
> java.util.concurrent.ConcurrentHashMap$Node)
> at
> org.apache.solr.common.cloud.ZkStateReader$PropsNotification.<init>(ZkStateReader.java:2262)
> at
> org.apache.solr.common.cloud.ZkStateReader.notifyPropsWatchers(ZkStateReader.java:2243)
> at
> org.apache.solr.common.cloud.ZkStateReader$PropsWatcher.refreshAndWatch(ZkStateReader.java:1458)
> - locked <0x00000000e6207500> (a
> java.util.concurrent.ConcurrentHashMap)
> at
> org.apache.solr.common.cloud.ZkStateReader$PropsWatcher.process(ZkStateReader.java:1440)
> at
> org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.lambda$process$1(SolrZkClient.java:838)
> at
> org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor$$Lambda$253/0x00000001004a4440.run(Unknown
> Source)
> at
> java.util.concurrent.Executors$RunnableAdapter.call([email protected]/Executors.java:515)
> at
> java.util.concurrent.FutureTask.run([email protected]/FutureTask.java:264)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$140/0x0000000100308c40.run(Unknown
> Source)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1128)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:628)
> at java.lang.Thread.run([email protected]/Thread.java:834)
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]