[ https://issues.apache.org/jira/browse/KAFKA-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16461448#comment-16461448 ]
David Glasser commented on KAFKA-6843: -------------------------------------- Sorry, I may be wrong here. I think the default for these properties got changed in Java 1.7 or so to be OK, and the real problem is a Zookeeper issue (ZOOKEEPER-2184) which is hopefully fixed in Kafka 1.1 (KAFKA-5473). We are still on 1.0 and are planning to upgrade now. > Document issue with DNS TTL > --------------------------- > > Key: KAFKA-6843 > URL: https://issues.apache.org/jira/browse/KAFKA-6843 > Project: Kafka > Issue Type: Bug > Reporter: David Glasser > Priority: Major > > We run Kafka and Zookeeper in Google Kubernetes Engine. We have recently had > problems where our brokers had serious problems when GKE replaced our cluster > (cycling both Zookeeper and Kafka in parallel). Kafka (1.0) brokers lost the > ability the talk to Zookeeper, and eventually failed their controlled > shutdown, leading to slow startup times for the new broker and outages for > our system. > We eventually tracked this down to the fact that (at least in our > environment) the default JVM DNS caching behavior is to cache results > forever. We rely on DNS to connect to Zookeeper, and the DNS resolution > changes when the Zookeeper pods are replaced. > The fix is straightforward: setting the property networkaddress.cache.ttl or > sun.net.inetaddr.ttl to make the caching non-infinite (or use a "security > manager"). See > [https://docs.oracle.com/javase/8/docs/technotes/guides/net/properties.html] > for details. > I think this gotcha should be documented. Probably at > [https://kafka.apache.org/11/documentation/#java] ? I'm happy to submit a PR > if people agree this is the right place. (I suppose somehow fixing this in > code would be nice too.) > By the way, if you search the Apache issue tracker for > [networkaddress.cache.ttl|https://issues.apache.org/jira/browse/JAMES-774?jql=text%20~%20%22%5C%22networkaddress.cache.ttl%5C%22%22], > you'll learn that this is a common issue faced by many Apache Java projects. -- This message was sent by Atlassian JIRA (v7.6.3#76005)