hoesler opened a new issue #6130: Unhandled Errors in Curator leave Druid Nodes in disconnected state URL: https://github.com/apache/incubator-druid/issues/6130 I am running Druid (0.12.1) and Zookeeper (3.4.10) on Kubernetes. Occasionally, for some pods the Zookeeper Service URL gets temporarily unresolvable. This results in Curator reporting following error: ``` 2018-06-25T01:14:00,607 ERROR [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up java.net.UnknownHostException: dev-druid-zookeeper.druid at java.net.InetAddress.getAllByName0(InetAddress.java:1280) ~[?:1.8.0_151] at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_151] at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_151] at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f] at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f] at org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29) ~[curator-client-4.0.0.jar:?] at org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:191) ~[curator-framework-4.0.0.jar:4.0.0] at org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:101) ~[curator-client-4.0.0.jar:?] at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:57) ~[curator-client-4.0.0.jar:?] at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:99) ~[curator-client-4.0.0.jar:?] at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:141) ~[curator-client-4.0.0.jar:?] at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:938) [curator-framework-4.0.0.jar:4.0.0] at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:912) [curator-framework-4.0.0.jar:4.0.0] at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:70) [curator-framework-4.0.0.jar:4.0.0] at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:316) [curator-framework-4.0.0.jar:4.0.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_151] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151] ``` After this, the druid node is still up but won't re-register with zookeeper, leaving it unavailable to the cluster. A manual restart helps, but is undesirable. After some research, I found the root cause is a curator issue reported in [CURATOR-229](https://issues.apache.org/jira/browse/CURATOR-229). One of the proposed solutions here is to [listen for unhandled curator errors](https://issues.apache.org/jira/browse/CURATOR-229?focusedCommentId=15963384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15963384). I will open a PR for this. Related Druid Issue: https://github.com/apache/incubator-druid/issues/2495
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
