Sergey Edunov created GIRAPH-1098: ------------------------------------- Summary: Job may get stuck if zookeeper port fixed and is in use Key: GIRAPH-1098 URL: https://issues.apache.org/jira/browse/GIRAPH-1098 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov
We see jobs getting stuck indefinitely if zookeeper port is in use: INFO 2016-07-19 16:08:29,168 [main] org.apache.zookeeper.server.NIOServerCnxnFactory - binding to port ::/0:0:0:0:0:0:0:0:22181 ERROR 2016-07-19 16:08:29,168 [main] org.apache.giraph.zk.InProcessZooKeeperRunner - Unable to start zookeeper java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196) at org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154) at org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97) at org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52) at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476) at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447) at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627) at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301) at org.apache.hadoop.mapred.Task.run(Task.java:604) at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177) -- This message was sent by Atlassian JIRA (v6.3.4#6332)