hoesler opened a new issue #6130: Unhandled Errors in Curator leave Druid Nodes 
in disconnected state
URL: https://github.com/apache/incubator-druid/issues/6130
 
 
   I am running Druid (0.12.1) and Zookeeper (3.4.10) on Kubernetes. 
Occasionally, for some pods the Zookeeper Service URL gets temporarily 
unresolvable. This results in Curator reporting following error:
   ```
   2018-06-25T01:14:00,607 ERROR [Curator-Framework-0] 
org.apache.curator.framework.imps.CuratorFrameworkImpl - Background exception 
was not retry-able or retry gave up
   java.net.UnknownHostException: dev-druid-zookeeper.druid
        at java.net.InetAddress.getAllByName0(InetAddress.java:1280) 
~[?:1.8.0_151]
        at java.net.InetAddress.getAllByName(InetAddress.java:1192) 
~[?:1.8.0_151]
        at java.net.InetAddress.getAllByName(InetAddress.java:1126) 
~[?:1.8.0_151]
        at 
org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
 ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]
        at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) 
~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f]
        at 
org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
 ~[curator-client-4.0.0.jar:?]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:191)
 ~[curator-framework-4.0.0.jar:4.0.0]
        at 
org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:101) 
~[curator-client-4.0.0.jar:?]
        at org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:57) 
~[curator-client-4.0.0.jar:?]
        at 
org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:99) 
~[curator-client-4.0.0.jar:?]
        at 
org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:141)
 ~[curator-client-4.0.0.jar:?]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:938)
 [curator-framework-4.0.0.jar:4.0.0]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:912)
 [curator-framework-4.0.0.jar:4.0.0]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.access$300(CuratorFrameworkImpl.java:70)
 [curator-framework-4.0.0.jar:4.0.0]
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:316)
 [curator-framework-4.0.0.jar:4.0.0]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[?:1.8.0_151]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [?:1.8.0_151]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [?:1.8.0_151]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_151]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
   ```
   
   After this, the druid node is still up but won't re-register with zookeeper, 
leaving it unavailable to the cluster. A manual restart helps, but is 
undesirable.
   
   After some research, I found the root cause is a curator issue reported in 
[CURATOR-229](https://issues.apache.org/jira/browse/CURATOR-229). One of the 
proposed solutions here is to [listen for unhandled curator 
errors](https://issues.apache.org/jira/browse/CURATOR-229?focusedCommentId=15963384&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15963384).
 I will open a PR for this.
   
   Related Druid Issue: https://github.com/apache/incubator-druid/issues/2495

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to