[ 
https://issues.apache.org/jira/browse/SOLR-17118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848378#comment-17848378
 ] 

Gus Heck commented on SOLR-17118:
---------------------------------

Sigh...

I spent a bunch of time trying to eliminate deadlock/race stuff in this. Seems 
like something snuck through anyway. If it can be simplified certainly that's 
great, but IIRC (it's been a while) the complexity came from avoiding 
deadlocks, and from the mismatch between how our tests startup vs the way we 
actually start up as a server (the tests use JettySolrRunner to skip a bunch of 
standard web container stuff to speed up tests)

There's also an outside chance that there's some timing difference these days, 
due to Jetty 10's move to EventListener, vs LifecycleListner but that's got no 
proof and is pure speculation.

> Solr deadlock during servlet container start
> --------------------------------------------
>
>                 Key: SOLR-17118
>                 URL: https://issues.apache.org/jira/browse/SOLR-17118
>             Project: Solr
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 9.2.1
>            Reporter: Andreas Hubold
>            Priority: Major
>              Labels: deadlock, servlet-context
>
> In rare cases, Solr can run into a deadlock when started. The servlet 
> container startup thread gets blocked and there's no other thread that could 
> unblock it:
> {noformat}
> "main" #1 prio=5 os_prio=0 cpu=5922.39ms elapsed=7490.27s 
> tid=0x00007f637402ae70 nid=0x47 waiting on condition [0x00007f6379488000]
>    java.lang.Thread.State: WAITING (parking)
>     at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
>     - parking to wait for  <0x0000000081da8000> (a 
> java.util.concurrent.CountDownLatch$Sync)
>     at java.util.concurrent.locks.LockSupport.park([email protected]/Unknown 
> Source)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/Unknown
>  Source)
>     at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly([email protected]/Unknown
>  Source)
>     at java.util.concurrent.CountDownLatch.await([email protected]/Unknown 
> Source)
>     at 
> org.apache.solr.servlet.CoreContainerProvider$ContextInitializationKey.waitForReadyService(CoreContainerProvider.java:523)
>     at 
> org.apache.solr.servlet.CoreContainerProvider$ServiceHolder.getService(CoreContainerProvider.java:562)
>     at 
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:148)
>     at 
> org.eclipse.jetty.servlet.FilterHolder.initialize(FilterHolder.java:133)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:725)
>     at 
> org.eclipse.jetty.servlet.ServletHandler$$Lambda$315/0x00007f62fc2674b8.accept(Unknown
>  Source)
>     at 
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining([email protected]/Unknown
>  Source)
>     at 
> java.util.stream.Streams$ConcatSpliterator.forEachRemaining([email protected]/Unknown
>  Source)
>     at 
> java.util.stream.ReferencePipeline$Head.forEach([email protected]/Unknown 
> Source)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
>     at 
> org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392)
>  
> {noformat}
> ContextInitializationKey.waitForReadyService should have been unblocked by 
> CoreContainerProvider#init, which is calling ServiceHolder#setService. This 
> should work because CoreContainerProvider#init is always called before 
> SolrDispatchFilter#init (ServletContextListeners are initialized before 
> Filters). 
> But there's a problem: CoreContainerProvider#init stores the 
> ContextInitializationKey and the mapped ServiceHolder in 
> CoreContainerProvider#services, and that's a *WeakHashMap*: 
> {code:java}
>       services 
>           .computeIfAbsent(new ContextInitializationKey(servletContext), 
> ServiceHolder::new) 
>           .setService(this); 
> {code}
> The key is not referenced anywhere else, which makes the mapping a candidate 
> for garbage collection. The ServiceHolder value also does not reference the 
> key anymore, because #setService cleared the reference. 
> With bad luck, the mapping is already gone from the WeakHashMap before 
> SolrDispatchFilter#init tries to retrieve it with 
> CoreContainerProvider#serviceForContext. And that method will then create a 
> new ContextInitializationKey and ServiceHolder, which is then used for 
> #waitForReadyService. But such a new ContextInitializationKey has never 
> received a #makeReady call, and #waitForReadyService will block forever.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to