soullkk opened a new issue, #14950:
URL: https://github.com/apache/druid/issues/14950

   Coordinator is unable to respond to requests due to deadlock.
   
   ### Affected Version
   
   druid 24.0.1
   
   ### Description
   
   - 3 nodes in cluster
   - to reproduce
   ```
    **step1**
     public void startPollingDatabasePeriodically()
     {
       ReentrantReadWriteLock.WriteLock lock = startStopPollLock.writeLock();
       lock.lock();
       try {
         if (exec == null) {
           throw new IllegalStateException(getClass().getName() + " is not 
started");
         }
         if (isPollingDatabasePeriodically()) {
           return;
         }
   
         **try {
           this.testCountDownLatch.await();
         }
         catch (InterruptedException e2) {
           SqlSegmentsMetadataManager.log.warn("testmock error");
           throw new RuntimeException(e2);
         }**
   
    **step2**
     private Runnable createPollTaskForStartOrder(long startOrder, 
PeriodicDatabasePoll periodicDatabasePoll)
     {
       return () -> {
         // If latest poll was an OnDemandDatabasePoll that started less than 
periodicPollDelay,
         // We will wait for (periodicPollDelay - currentTime - 
LatestOnDemandDatabasePollStartTime) then check again.
         try {
           long periodicPollDelayNanos = 
TimeUnit.MILLISECONDS.toNanos(periodicPollDelay.getMillis());
           while (latestDatabasePoll != null
                  && latestDatabasePoll instanceof OnDemandDatabasePoll
                  && ((OnDemandDatabasePoll) 
latestDatabasePoll).nanosElapsedFromInitiation() < periodicPollDelayNanos) {
             long sleepNano = periodicPollDelayNanos
                              - ((OnDemandDatabasePoll) 
latestDatabasePoll).nanosElapsedFromInitiation();
             TimeUnit.NANOSECONDS.sleep(sleepNano);
           }
         }
         catch (Exception e) {
           log.debug(e, "Exception found while waiting for next periodic poll");
         }
         **try {
           this.testCountDownLatch2.await();
         }
         catch (InterruptedException e2) {
           SqlSegmentsMetadataManager.log.warn("testmock error");
           throw new RuntimeException(e2);
         }**
   
    **step3**
       private void useLatestIfWithinDelayOrPerformNewDatabasePoll() {
           SqlSegmentsMetadataManager.log.warn("testmock0904, run 
useLatestIfWithinDelayOrPerformNewDatabasePoll");
           if (this.useLatestSnapshotIfWithinDelay()) {
               return;
           }
           **this.testCountDownLatch2.countDown();**
           SqlSegmentsMetadataManager.log.warn("testmock0904, to get write 
lock, testCountDownLatch.countDown();");
           final ReentrantReadWriteLock.WriteLock lock = 
this.startStopPollLock.writeLock();
           lock.lock();
           **this.testCountDownLatch.countDown();**
           SqlSegmentsMetadataManager.log.warn("testmock0904, get write lock, 
testCountDownLatch.countDown();");
   ```
   - error log as follows
   
   ```
   2023-08-28 23:41:45,157 WARN  
[qtp2013342140-225][ROOT][org.eclipse.jetty.util.thread.QueuedThreadPool] 
QueuedThreadPool[qtp2013342140]@780129bc{STARTED,125<=125<=125,i=0,r=-1,q=20000}[ReservedThreadExecutor@28a0d3d0{reserved=0/12,pending=12}]
 rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@29194ad2
   2023-08-28 23:41:45,157 WARN  
[qtp2013342140-233][ROOT][org.eclipse.jetty.util.thread.strategy.EatWhatYouKill]
 
   java.util.concurrent.RejectedExecutionException: 
CEP:SocketChannelEndPoint@19a5ca49{l=/125.1*.*:26200,r=/125.1.*.*:51575,OSHUT,fill=FI,flush=-,to=2377/300000}{io=1/0,kio=1,kro=1}->SslConnection@8f02a8f{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~>DecryptedEndPoint@6b77c89c{l=/125.1.*.*:26200,r=/125.1.*.*:51575,OSHUT,fill=FI,flush=-,to=2377/300000}=>HttpConnection@5cfc107a[p=HttpParser{s=CLOSE,0
 of 
-1},g=HttpGenerator@1b3fbd87{s=START}]=>HttpChannelOverHttp@1212483d{s=HttpChannelState@47f441cb{s=IDLE
 rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true 
al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}:runFillable:BLOCKING
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:716)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.execute(EatWhatYouKill.java:375)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
 ~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
   
   2023-08-28 23:49:42,008 WARN  
[qtp2013342140-227][ROOT][org.eclipse.jetty.util.thread.QueuedThreadPool] 
QueuedThreadPool[qtp2013342140]@780129bc{STARTED,125<=125<=125,i=0,r=-1,q=20000}[ReservedThreadExecutor@28a0d3d0{reserved=1/12,pending=5}]
 rejected Accept@5b12929e[java.nio.channels.SocketChannel[connected 
local=/125.1.*.*:26200 remote=/125.1.*.*:41719]]
   ```
   
   - Any debugging that you have already done
   
   
![image](https://github.com/apache/druid/assets/55041925/e0536680-8a69-4294-9da2-b00cdaac46ce)
   ```
   "qtp2013342140-270" #270 daemon prio=5 os_prio=0 cpu=4125.02ms 
elapsed=442210.25s tid=0x00005646edf8c000 nid=0xb8943 waiting on condition 
[0x00007f55fd4d9000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000dee17ec8> (a 
java.util.concurrent.CompletableFuture$Signaller)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
        at 
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
        at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
        at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
        at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:199)
        at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1246)
        at 
org.apache.druid.metadata.SqlSegmentsMetadataManager.useLatestSnapshotIfWithinDelay(SqlSegmentsMetadataManager.java:443)
        at 
org.apache.druid.metadata.SqlSegmentsMetadataManager.useLatestIfWithinDelayOrPerformNewDatabasePoll(SqlSegmentsMetadataManager.java:416)
        at 
org.apache.druid.metadata.SqlSegmentsMetadataManager.getSnapshotOfDataSourcesWithAllUsedSegments(SqlSegmentsMetadataManager.java:812)
        at 
org.apache.druid.server.http.MetadataResource.getAllUsedSegmentsWithOvershadowedStatus(MetadataResource.java:173)
        at 
org.apache.druid.server.http.MetadataResource.getAllUsedSegments(MetadataResource.java:143)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
        at 
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
        at 
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at 
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
        at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at 
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at 
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at 
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
        at 
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
        at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
        at 
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
        at 
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
        at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
        at 
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at 
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286)
        at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276)
        at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181)
        at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
        at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:73)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:82)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.initialization.jetty.StandardResponseHeaderFilterHolder$StandardResponseHeaderFilter.doFilter(StandardResponseHeaderFilterHolder.java:161)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.security.AllowHttpMethodsResourceFilter.doFilter(AllowHttpMethodsResourceFilter.java:78)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:75)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:84)
        at 
org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77)
        at 
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
        at 
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
        at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
        at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
        at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
        at 
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
        at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:516)
        at 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
        at 
org.eclipse.jetty.server.HttpChannel$$Lambda$373/1576325249.dispatch(Unknown 
Source)
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
        at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
        at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at 
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
        at 
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
        at 
org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
        at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
        at 
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
        at 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
        at java.lang.Thread.run(Thread.java:750)
   ```
   
![image](https://github.com/apache/druid/assets/55041925/77e969f8-e320-4367-90f4-8f1b56d6fe02)
   ```
   
"org.apache.druid.metadata.storage.zenith.ZenithSQLMetadataSegmentManager-Exec--0"
 #311 daemon prio=5 os_prio=0 cpu=0.15ms elapsed=39960.13s 
tid=0x00005646e7c6b800 nid=0x6cad3 waiting on condition [0x00007f5605252000]
      java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000c0cc63c8> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:850)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:981)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1296)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
        at 
org.apache.druid.metadata.SqlSegmentsMetadataManager.lambda$createPollTaskForStartOrder$0(SqlSegmentsMetadataManager.java:341)
        at 
org.apache.druid.metadata.SqlSegmentsMetadataManager$$Lambda$434/697063880.run(Unknown
 Source)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to