[ 
https://issues.apache.org/jira/browse/HAMA-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujing.zui updated HAMA-890:
----------------------------

    Description: 
I build a cluster, which contain 4 groomservers.
I run a pipesApplication, matrixmultiplication, and in one groomserver, it 
occurs a problems to connect to ZooKeeperSyncClient. so entire job failed. but 
in other groomservers, everything is fine.
I reboot the problematic node, still not solve this problem.

As my understanding, both sides of this connect are in one node, connection 
accept timeout seems impossible. iptables is off, and network is normal, ping 
every node is ok.
I am so confused, any one can help me or give me some hint or suggestion? 
Thanks so much!

the log list below:
14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer 
address:hd1.hadoop.lab port:61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is 
deprecated. Instead, use mapreduce.job.cache.local.files
14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002
14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
java.lang.NullPointerException
        at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)

  was:
I build a cluster, which contain 4 groomserver.
I run a pipesApplication, matrixmultiplication, and in one groomserver, it 
occurs a problems to connect to ZooKeeperSyncClient. so entire job failed. but 
other groomserver, everything is fine.
I reboot the problematic node, cannot solve this problem.

As I understand, both sides of this connect are in one node, accept timeout 
seems impossible. iptables is off, and network is normal, ping every node is ok.
I am so confused, any one can help me or give me some hint or suggestion? 
Thanks so much!

the log list below:
14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer 
address:hd1.hadoop.lab port:61002
14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is 
deprecated. Instead, use mapreduce.job.cache.local.files
14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002
14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
java.lang.NullPointerException
        at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
        at java.net.ServerSocket.implAccept(ServerSocket.java:478)
        at java.net.ServerSocket.accept(ServerSocket.java:446)
        at 
org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
        at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
        at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
        at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)


> PipesApplication connect to ZooKeeperSyncClinetImpl always timeout
> ------------------------------------------------------------------
>
>                 Key: HAMA-890
>                 URL: https://issues.apache.org/jira/browse/HAMA-890
>             Project: Hama
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>         Environment: Hadoop 2.2.0 distribute mode
>            Reporter: lujing.zui
>
> I build a cluster, which contain 4 groomservers.
> I run a pipesApplication, matrixmultiplication, and in one groomserver, it 
> occurs a problems to connect to ZooKeeperSyncClient. so entire job failed. 
> but in other groomservers, everything is fine.
> I reboot the problematic node, still not solve this problem.
> As my understanding, both sides of this connect are in one node, connection 
> accept timeout seems impossible. iptables is off, and network is normal, ping 
> every node is ok.
> I am so confused, any one can help me or give me some hint or suggestion? 
> Thanks so much!
> the log list below:
> 14/03/15 16:21:05 INFO ipc.Server: Starting Socket Reader #1 for port 61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server Responder: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server listener on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 0 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 2 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 1 on 61002: starting
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 3 on 61002: starting
> 14/03/15 16:21:05 INFO message.HamaMessageManagerImpl: BSPPeer 
> address:hd1.hadoop.lab port:61002
> 14/03/15 16:21:05 INFO ipc.Server: IPC Server handler 4 on 61002: starting
> 14/03/15 16:21:05 INFO Configuration.deprecation: mapred.cache.localFiles is 
> deprecated. Instead, use mapreduce.job.cache.local.files
> 14/03/15 16:21:05 INFO sync.ZKSyncClient: Initializing ZK Sync Client
> 14/03/15 16:21:05 INFO sync.ZooKeeperSyncClientImpl: Start connecting to 
> Zookeeper! At hd1.hadoop.lab/222.195.92.69:61002
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error running bsp setup and bsp function.
> java.net.SocketTimeoutException: Accept timed out
>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
>       at java.net.ServerSocket.implAccept(ServerSocket.java:478)
>       at java.net.ServerSocket.accept(ServerSocket.java:446)
>       at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
>       at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>       at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>       at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>       at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Error cleaning up after bsp executed.
> java.lang.NullPointerException
>       at org.apache.hama.pipes.PipesBSP.cleanup(PipesBSP.java:95)
>       at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:177)
>       at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>       at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> 14/03/15 16:21:08 INFO ipc.Server: Stopping server on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 0 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 2 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server listener on 61002
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 3 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 4 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: IPC Server handler 1 on 61002: exiting
> 14/03/15 16:21:08 INFO ipc.Server: Stopping IPC Server Responder
> 14/03/15 16:21:08 ERROR bsp.BSPTask: Shutting down ping service.
> 14/03/15 16:21:08 FATAL bsp.GroomServer: Error running child
> java.net.SocketTimeoutException: Accept timed out
>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
>       at java.net.ServerSocket.implAccept(ServerSocket.java:478)
>       at java.net.ServerSocket.accept(ServerSocket.java:446)
>       at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
>       at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>       at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>       at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>       at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)
> java.net.SocketTimeoutException: Accept timed out
>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:375)
>       at java.net.ServerSocket.implAccept(ServerSocket.java:478)
>       at java.net.ServerSocket.accept(ServerSocket.java:446)
>       at 
> org.apache.hama.pipes.PipesApplication.start(PipesApplication.java:286)
>       at org.apache.hama.pipes.PipesBSP.setup(PipesBSP.java:43)
>       at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170)
>       at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144)
>       at 
> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to