[ https://issues.apache.org/jira/browse/HAMA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lujing.zui resolved HAMA-890. ----------------------------- Resolution: Won't Fix hostname system problem would cause zookeeper exiting. and cause this probelm. > PipesApplication connect to ZooKeeperSyncClinetImpl always timeout > ------------------------------------------------------------------ > > Key: HAMA-890 > URL: https://issues.apache.org/jira/browse/HAMA-890 > Project: Hama > Issue Type: Bug > Affects Versions: 0.7.0 > Environment: Hadoop 2.2.0 distribute mode > Reporter: lujing.zui > > I build a cluster, which contain 4 groomservers. > I run a pipesApplication, matrixmultiplication, and in one groomserver, it > occurs a problems to connect to ZooKeeperSyncClient. so entire job failed. > but in other groomservers, everything is fine. > I reboot the problematic node, still not solve this problem. > As my understanding, both sides of this connect are in one node, connection > accept timeout seems impossible. iptables is off, and network is normal, ping > every node is ok. > I am so confused, any one can help me or give me some hint or suggestion? > Thanks so much! > the log list below: > 14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002 > 14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting > 14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer > address:hd1.hadoop.lab port:61002 > 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting > 14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is > deprecated. Instead, use mapreduce.job.cache.local.files > 14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client > 14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to > Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002 > 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function. > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > 14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed. > java.lang.NullPointerException > at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > 14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002 > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002 > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting > 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder > 14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service. > 14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375) > at java.net.ServerSocket.implAccept(ServerSocket.java:478) > at java.net.ServerSocket.accept(ServerSocket.java:446) > at > org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286) > at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43) > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > at > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) -- This message was sent by Atlassian JIRA (v6.2#6252)