gianm commented on issue #12904:
URL: https://github.com/apache/druid/issues/12904#issuecomment-1253063416

   Slack thread mentioning an issue: 
https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1663715405113769. 
Reproducing some info here.
   
   > Since switch to using the Kubernetes extension instead of Zookeeper, I 
have been seeing an issue and I am curious if anyone else has seen it.  We are 
running 0.23.0 with indexers instead of middlemanagers.  When an indexer pod 
goes away, we will begin seeing errors like the following in the coordinator 
logs (stack trace and details in thread)
   
   ```
   {
     "level": "ERROR",
     "thread": "HttpServerInventoryView-4",
     "message": "failed to get sync response from 
[http://10.4.132.249:8091/_1663714827177]. Return code [0], Reason: [null]",
     "exception": {
       "exception_class": "org.jboss.netty.channel.ChannelException",
       "exception_message": "Faulty channel in resource pool",
       "stacktrace": "org.jboss.netty.channel.ChannelException: Faulty channel 
in resource pool\n\tat 
org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)\n\tat
 
org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)\n\tat
 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat
 java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat
 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat
 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat
 java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: 
org.jboss.netty.channel.ConnectTimeoutException: connection timed out: 
/10.4.132.249:8091\n\tat 
org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectT
 imeout(NioClientBoss.java:139)\n\tat 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)\n\tat
 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)\n\tat
 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)\n\tat
 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)\n\tat
 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)\n\t...
 3 more\n"
     },
     "hostName": "storage--druid-coordinator-8454fd4cf5-zz94r"
   }
   ```
   
   
   ```
   org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
     at 
org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)
     at 
org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)
     at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
     at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
     at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
     at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
     at java.base/java.lang.Thread.run(Thread.java:829)
     Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection 
timed out: /10.4.132.249:8091
     at 
org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139)
     at 
org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
     at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
     at 
org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
     at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
     at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
     ... 3 more\n
   ```
   
   > It appears that once it gets into this state it will continue to retry 
indefinitely, and eventually the coordinator becomes bogged down and 
non-responsive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to