[ 
https://issues.apache.org/jira/browse/SOLR-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15477568#comment-15477568
 ] 

Michael Braun commented on SOLR-9470:
-------------------------------------

Dug more into this and only two threads are actually part of the core deadlock 
- 

"recoveryExecutor-3-thread-1-processing-n:x.x.x.166:8983_solr 
x:mycollection_shard1_replica2 s:shard1 c:mycollection r:core_node97":
{code}
        - parking to wait for  <0x00007fc1b0a97250> (a 
java.util.concurrent.locks.ReentrantLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:224)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1804)
        at 
org.apache.solr.handler.IndexFetcher.openNewSearcherAndUpdateCommitPoint(IndexFetcher.java:746)
        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:523)
        at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
        at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:397)
{code}

It first acquires the iwLock ( 0x00007fc1b0a96fe0) by this mechanism:
org.apache.solr.update.DefaultSolrCoreState.newIndexWriter(DefaultSolrCoreState.java
 210)
org.apache.solr.update.DirectUpdateHandler2.newIndexWriter(DirectUpdateHandler2.java
 698)
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java 520)


Then as you see from the stacktrace above, it's waiting on the 
openSearcherLock, which is held by the thread below:

"qtp1879034789-189":
{code}
at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00007fc1b0a96fe0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
        at 
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:159)
        at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:104)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1601)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1806)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1552)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1487)
        at 
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:115)
        at 
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:130)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
{code}

It's already holding the openSearchLock (0x00007fc1b0a97250) and wants the 
iwLock. It gets the openSearchLock by this mechanism:
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1804) is 
where it does the actual lock of openSearcher.lock, called by....
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1552)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1487)
        at 
org.apache.solr.request.SolrQueryRequestBase.getSearcher(SolrQueryRequestBase.java:115)
        at 
org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:130)

> Deadlocked threads in recovery
> ------------------------------
>
>                 Key: SOLR-9470
>                 URL: https://issues.apache.org/jira/browse/SOLR-9470
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 6.2
>            Reporter: Michael Braun
>         Attachments: solr-deadlock.txt
>
>
> Background: Booted up a cluster and replicas were in recovery. All replicas 
> recovered minus one, and it was hanging on HTTP requests. Issued shutdown and 
> solr would not shut down. Examined with JStack and found a deadlock had 
> occurred. The relevant thread information is attached. Some information has 
> been redacted as well (some custom URPs, IPs) from the stack traces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to