Piyush Narang created FLINK-17443:
-------------------------------------
Summary: Flink's ZK in HA mode setup is unable to start up if any
of the zk hosts are unreachable
Key: FLINK-17443
URL: https://issues.apache.org/jira/browse/FLINK-17443
Project: Flink
Issue Type: Bug
Reporter: Piyush Narang
We occasionally hit an issue where our Flink cluster will not startup if any of
the zookeeper hosts passed in the "high-availability.zookeeper.quorum" config
setting are unreachable. This seems to stem from us using an older zookeeper
dependency version (3.4.10).
Sample error we see is shown below.
This error seems to stem from us being on an older zookeeper release (3.4.10).
This has been fixed as part of:
https://issues.apache.org/jira/browse/ZOOKEEPER-1576 in the 3.4.x branch
([https://github.com/apache/zookeeper/commit/be1409cc9a14ac2e28693e0e02a0ba6d9713565e]).
{code:java}
java.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service not
knownjava.net.UnknownHostException: zk01-pa4.hpc.criteo.prod: Name or service
not known at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at
java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) at
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) at
java.net.InetAddress.getAllByName0(InetAddress.java:1277) at
java.net.InetAddress.getAllByName(InetAddress.java:1193) at
java.net.InetAddress.getAllByName(InetAddress.java:1127) at
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:61)
at
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445)
at
org.apache.flink.shaded.curator.org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeeperFactory.java:29)
at
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl$2.newZooKeeper(CuratorFrameworkImpl.java:150)
at
org.apache.flink.shaded.curator.org.apache.curator.HandleHolder$1.getZooKeeper(HandleHolder.java:94)
at
org.apache.flink.shaded.curator.org.apache.curator.HandleHolder.getZooKeeper(HandleHolder.java:55)
at
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.reset(ConnectionState.java:262)
at
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState.start(ConnectionState.java:109)
at
org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient.start(CuratorZookeeperClient.java:191)
at
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl.start(CuratorFrameworkImpl.java:259)
at
org.apache.flink.runtime.util.ZooKeeperUtils.startCuratorFramework(ZooKeeperUtils.java:131)
at
org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:123)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.createHaServices(ClusterEntrypoint.java:292)
at
org.apache.flink.runtime.entrypoint.ClusterEntrypoint.initializeServices(ClusterEntrypoint.java:257){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)