soullkk opened a new issue, #14950:
URL: https://github.com/apache/druid/issues/14950
Coordinator is unable to respond to requests due to deadlock.
### Affected Version
druid 24.0.1
### Description
- 3 nodes in cluster
- to reproduce
```
**step1**
public void startPollingDatabasePeriodically()
{
ReentrantReadWriteLock.WriteLock lock = startStopPollLock.writeLock();
lock.lock();
try {
if (exec == null) {
throw new IllegalStateException(getClass().getName() + " is not
started");
}
if (isPollingDatabasePeriodically()) {
return;
}
**try {
this.testCountDownLatch.await();
}
catch (InterruptedException e2) {
SqlSegmentsMetadataManager.log.warn("testmock error");
throw new RuntimeException(e2);
}**
**step2**
private Runnable createPollTaskForStartOrder(long startOrder,
PeriodicDatabasePoll periodicDatabasePoll)
{
return () -> {
// If latest poll was an OnDemandDatabasePoll that started less than
periodicPollDelay,
// We will wait for (periodicPollDelay - currentTime -
LatestOnDemandDatabasePollStartTime) then check again.
try {
long periodicPollDelayNanos =
TimeUnit.MILLISECONDS.toNanos(periodicPollDelay.getMillis());
while (latestDatabasePoll != null
&& latestDatabasePoll instanceof OnDemandDatabasePoll
&& ((OnDemandDatabasePoll)
latestDatabasePoll).nanosElapsedFromInitiation() < periodicPollDelayNanos) {
long sleepNano = periodicPollDelayNanos
- ((OnDemandDatabasePoll)
latestDatabasePoll).nanosElapsedFromInitiation();
TimeUnit.NANOSECONDS.sleep(sleepNano);
}
}
catch (Exception e) {
log.debug(e, "Exception found while waiting for next periodic poll");
}
**try {
this.testCountDownLatch2.await();
}
catch (InterruptedException e2) {
SqlSegmentsMetadataManager.log.warn("testmock error");
throw new RuntimeException(e2);
}**
**step3**
private void useLatestIfWithinDelayOrPerformNewDatabasePoll() {
SqlSegmentsMetadataManager.log.warn("testmock0904, run
useLatestIfWithinDelayOrPerformNewDatabasePoll");
if (this.useLatestSnapshotIfWithinDelay()) {
return;
}
**this.testCountDownLatch2.countDown();**
SqlSegmentsMetadataManager.log.warn("testmock0904, to get write
lock, testCountDownLatch.countDown();");
final ReentrantReadWriteLock.WriteLock lock =
this.startStopPollLock.writeLock();
lock.lock();
**this.testCountDownLatch.countDown();**
SqlSegmentsMetadataManager.log.warn("testmock0904, get write lock,
testCountDownLatch.countDown();");
```
- error log as follows
```
2023-08-28 23:41:45,157 WARN
[qtp2013342140-225][ROOT][org.eclipse.jetty.util.thread.QueuedThreadPool]
QueuedThreadPool[qtp2013342140]@780129bc{STARTED,125<=125<=125,i=0,r=-1,q=20000}[ReservedThreadExecutor@28a0d3d0{reserved=0/12,pending=12}]
rejected org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@29194ad2
2023-08-28 23:41:45,157 WARN
[qtp2013342140-233][ROOT][org.eclipse.jetty.util.thread.strategy.EatWhatYouKill]
java.util.concurrent.RejectedExecutionException:
CEP:SocketChannelEndPoint@19a5ca49{l=/125.1*.*:26200,r=/125.1.*.*:51575,OSHUT,fill=FI,flush=-,to=2377/300000}{io=1/0,kio=1,kro=1}->SslConnection@8f02a8f{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~>DecryptedEndPoint@6b77c89c{l=/125.1.*.*:26200,r=/125.1.*.*:51575,OSHUT,fill=FI,flush=-,to=2377/300000}=>HttpConnection@5cfc107a[p=HttpParser{s=CLOSE,0
of
-1},g=HttpGenerator@1b3fbd87{s=START}]=>HttpChannelOverHttp@1212483d{s=HttpChannelState@47f441cb{s=IDLE
rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true
al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}:runFillable:BLOCKING
at
org.eclipse.jetty.util.thread.QueuedThreadPool.execute(QueuedThreadPool.java:716)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.execute(EatWhatYouKill.java:375)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
~[jetty-util-9.4.51.v20230217.jar:9.4.51.v20230217]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]
2023-08-28 23:49:42,008 WARN
[qtp2013342140-227][ROOT][org.eclipse.jetty.util.thread.QueuedThreadPool]
QueuedThreadPool[qtp2013342140]@780129bc{STARTED,125<=125<=125,i=0,r=-1,q=20000}[ReservedThreadExecutor@28a0d3d0{reserved=1/12,pending=5}]
rejected Accept@5b12929e[java.nio.channels.SocketChannel[connected
local=/125.1.*.*:26200 remote=/125.1.*.*:41719]]
```
- Any debugging that you have already done

```
"qtp2013342140-270" #270 daemon prio=5 os_prio=0 cpu=4125.02ms
elapsed=442210.25s tid=0x00005646edf8c000 nid=0xb8943 waiting on condition
[0x00007f55fd4d9000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000dee17ec8> (a
java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
at
java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:199)
at
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1246)
at
org.apache.druid.metadata.SqlSegmentsMetadataManager.useLatestSnapshotIfWithinDelay(SqlSegmentsMetadataManager.java:443)
at
org.apache.druid.metadata.SqlSegmentsMetadataManager.useLatestIfWithinDelayOrPerformNewDatabasePoll(SqlSegmentsMetadataManager.java:416)
at
org.apache.druid.metadata.SqlSegmentsMetadataManager.getSnapshotOfDataSourcesWithAllUsedSegments(SqlSegmentsMetadataManager.java:812)
at
org.apache.druid.server.http.MetadataResource.getAllUsedSegmentsWithOvershadowedStatus(MetadataResource.java:173)
at
org.apache.druid.server.http.MetadataResource.getAllUsedSegments(MetadataResource.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at
com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286)
at
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276)
at
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181)
at
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
at
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:73)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:82)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.initialization.jetty.StandardResponseHeaderFilterHolder$StandardResponseHeaderFilter.doFilter(StandardResponseHeaderFilterHolder.java:161)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.security.AllowHttpMethodsResourceFilter.doFilter(AllowHttpMethodsResourceFilter.java:78)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:75)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:84)
at
org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77)
at
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at
org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at
org.eclipse.jetty.server.HttpChannel$$Lambda$373/1576325249.dispatch(Unknown
Source)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at
org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
at
org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
at
org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.lang.Thread.run(Thread.java:750)
```

```
"org.apache.druid.metadata.storage.zenith.ZenithSQLMetadataSegmentManager-Exec--0"
#311 daemon prio=5 os_prio=0 cpu=0.15ms elapsed=39960.13s
tid=0x00005646e7c6b800 nid=0x6cad3 waiting on condition [0x00007f5605252000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000c0cc63c8> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:850)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:981)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1296)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at
org.apache.druid.metadata.SqlSegmentsMetadataManager.lambda$createPollTaskForStartOrder$0(SqlSegmentsMetadataManager.java:341)
at
org.apache.druid.metadata.SqlSegmentsMetadataManager$$Lambda$434/697063880.run(Unknown
Source)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]