gianm commented on issue #12904: URL: https://github.com/apache/druid/issues/12904#issuecomment-1253063416
Slack thread mentioning an issue: https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1663715405113769. Reproducing some info here. > Since switch to using the Kubernetes extension instead of Zookeeper, I have been seeing an issue and I am curious if anyone else has seen it. We are running 0.23.0 with indexers instead of middlemanagers. When an indexer pod goes away, we will begin seeing errors like the following in the coordinator logs (stack trace and details in thread) ``` { "level": "ERROR", "thread": "HttpServerInventoryView-4", "message": "failed to get sync response from [http://10.4.132.249:8091/_1663714827177]. Return code [0], Reason: [null]", "exception": { "exception_class": "org.jboss.netty.channel.ChannelException", "exception_message": "Faulty channel in resource pool", "stacktrace": "org.jboss.netty.channel.ChannelException: Faulty channel in resource pool\n\tat org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131)\n\tat org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectT imeout(NioClientBoss.java:139)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)\n\tat org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)\n\tat org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)\n\tat org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)\n\tat org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)\n\t... 3 more\n" }, "hostName": "storage--druid-coordinator-8454fd4cf5-zz94r" } ``` ``` org.jboss.netty.channel.ChannelException: Faulty channel in resource pool at org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:131) at org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.4.132.249:8091 at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:139) at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) ... 3 more\n ``` > It appears that once it gets into this state it will continue to retry indefinitely, and eventually the coordinator becomes bogged down and non-responsive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
