[ 
https://issues.apache.org/jira/browse/HAMA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452724#comment-13452724
 ] 

Edward J. Yoon commented on HAMA-557:
-------------------------------------

{code}
edward@edward-VirtualBox:~/workspace/hama-trunk$ bin/hama jar 
examples/target/hama-examples-0.6.0-SNAPSHOT.jar bench 5 5 5
12/09/11 14:04:27 INFO bsp.BSPJobClient: Running job: job_201209111359_0002
12/09/11 14:04:30 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:04:36 INFO bsp.BSPJobClient: Current supersteps number: 2
12/09/11 14:04:45 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:04:51 INFO bsp.BSPJobClient: Current supersteps number: 3
12/09/11 14:05:06 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
12/09/11 14:05:24 INFO bsp.BSPJobClient: Current supersteps number: 0
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 
GMT
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:host.name=edward-VirtualBox
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:java.version=1.7.0_06
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:java.vendor=Oracle Corporation
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client 
environment:java.class.path=/home/edward/workspace/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/edward/workspace/hama-trunk/bin/../core/target/classes:/home/edward/workspace/hama-trunk/bin/../graph/target/classes:/home/edward/workspace/hama-trunk/bin/../hama-**.jar:/home/edward/workspace/hama-trunk/bin/../lib/ant-1.7.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/avro-1.6.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-cli-1.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-lang-2.6.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-math3-3.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/guava-10.0.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/junit-4.8.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/log4j-1.2.16.jar:/home/edward/workspace/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/home/edward/workspace/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/home/edward/workspace/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/home/edward/workspace/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/home/edward/workspace/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/zookeeper-3.3.3.jar::/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work/classes:/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client 
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:java.io.tmpdir=/tmp
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:java.compiler=<NA>
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:os.name=Linux
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:os.arch=amd64
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:os.version=3.2.0-29-generic
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:user.name=edward
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client environment:user.home=/home/edward
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Client 
environment:user.dir=/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper: 
Initiating client connection, connectString=edward-VirtualBox:21810 
sessionTimeout=1200000 
watcher=org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl@e33ad7
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO 
zookeeper.ClientCnxn: Opening socket connection to server 
edward-VirtualBox/127.0.1.1:21810
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO 
zookeeper.ClientCnxn: Socket connection established to 
edward-VirtualBox/127.0.1.1:21810, initiating session
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO sync.ZKSyncClient: 
Initializing ZK Sync Client
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO 
sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At 
edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO 
zookeeper.ClientCnxn: Session establishment complete on server 
edward-VirtualBox/127.0.1.1:21810, sessionid = 0x139b3b269e10013, negotiated 
timeout = 1200000
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO 
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61003
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO 
ipc.NettyTransceiver: [id: 0x15ee470d] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO 
ipc.NettyTransceiver: [id: 0x15ee470d, /127.0.0.1:51614 => 
edward-VirtualBox/127.0.1.1:61003] BOUND: /127.0.0.1:51614
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO 
ipc.NettyTransceiver: [id: 0x15ee470d, /127.0.0.1:51614 => 
edward-VirtualBox/127.0.1.1:61003] CONNECTED: edward-VirtualBox/127.0.1.1:61003
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] CONNECTED: 
/127.0.0.1:50522
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] CONNECTED: 
/127.0.0.1:50523
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ft.AsyncRcvdMsgCheckpointImpl: Creating path 
checkpoint/job_201209111359_0002/4/2
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 => 
edward-VirtualBox/127.0.1.1:61004] BOUND: /127.0.0.1:50523
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 => 
edward-VirtualBox/127.0.1.1:61004] CONNECTED: edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61002
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x4362b702] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] CONNECTED: 
/127.0.0.1:50524
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x4362b702, /127.0.0.1:55795 => 
edward-VirtualBox/127.0.1.1:61002] BOUND: /127.0.0.1:55795
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x4362b702, /127.0.0.1:55795 => 
edward-VirtualBox/127.0.1.1:61002] CONNECTED: edward-VirtualBox/127.0.1.1:61002
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO sync.ZKSyncClient: 
Writing data /bsp/job_201209111359_0002/checkpoint/2
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ft.AsyncRcvdMsgCheckpointImpl: Enabled = true checkPointInterval = 1 
lastCheckPointStep = 4 getSuperstepCount() = 4
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ft.AsyncRcvdMsgCheckpointImpl: checkpointNext = true checkpointMessageCount = 0
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 ERROR bsp.BSPTask: Error 
running bsp setup and bsp function.
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated 
to test by peer 2
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO zookeeper.ZooKeeper: 
Session: 0x139b3b269e10013 closed
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
zookeeper.ClientCnxn: EventThread shut down
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :> 
edward-VirtualBox/127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :> 
edward-VirtualBox/127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :> 
edward-VirtualBox/127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: Remote peer edward-VirtualBox/127.0.1.1:61004 closed 
connection.
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO 
ipc.NettyTransceiver: Disconnecting from edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer: 
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 ERROR bsp.BSPTask: 
Shutting down ping service.
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 FATAL bsp.GroomServer: 
Error running child
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated 
to test by peer 2
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated 
to test by peer 2
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2:     at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
12/09/11 14:05:24 INFO bsp.BSPJobClient: Job failed.
{code}

Wanted to test your RandBench but, always fails (at final Exception). Am I 
missed something?

I used TRUNK.
                
> Implement Checkpointing service in Hama
> ---------------------------------------
>
>                 Key: HAMA-557
>                 URL: https://issues.apache.org/jira/browse/HAMA-557
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core
>    Affects Versions: 0.6.0
>            Reporter: Suraj Menon
>            Assignee: Suraj Menon
>             Fix For: 0.6.0
>
>         Attachments: HAMA-505-557-610-611-v1.patch, 
> HAMA-505-557-610-611-v2.patch, HAMA-557-ft-framework.patch
>
>
> Implement checkpointing service in Apache Hama. My patches for HAMA-533 and 
> HAMA-534 are blocked on this.
> - Checkpointing should be done as messages are either sent or received. I 
> prefer while receiving messages, as we can achieve some parallelism with 
> asynchronous messages. Please comment if you differ.
> - BSPMaster should hold the checkpoint status for each task. Checkpoint 
> status includes superstep count and file information for which checkpointing 
> is complete
> - MessageManager should notify Checkpointer of a new message at BSPPeer.
> - Implement/Reuse MessageBundle class as splitClass in BSPPeerImpl for 
> recovery in initInput.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to