[
https://issues.apache.org/jira/browse/HAMA-557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452724#comment-13452724
]
Edward J. Yoon commented on HAMA-557:
-------------------------------------
{code}
edward@edward-VirtualBox:~/workspace/hama-trunk$ bin/hama jar
examples/target/hama-examples-0.6.0-SNAPSHOT.jar bench 5 5 5
12/09/11 14:04:27 INFO bsp.BSPJobClient: Running job: job_201209111359_0002
12/09/11 14:04:30 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:04:36 INFO bsp.BSPJobClient: Current supersteps number: 2
12/09/11 14:04:45 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:04:51 INFO bsp.BSPJobClient: Current supersteps number: 3
12/09/11 14:05:06 INFO bsp.BSPJobClient: Current supersteps number: 0
12/09/11 14:05:12 INFO bsp.BSPJobClient: Current supersteps number: 4
12/09/11 14:05:24 INFO bsp.BSPJobClient: Current supersteps number: 0
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
GMT
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:host.name=edward-VirtualBox
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:java.version=1.7.0_06
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:java.vendor=Oracle Corporation
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client
environment:java.class.path=/home/edward/workspace/hama-trunk/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/edward/workspace/hama-trunk/bin/../core/target/classes:/home/edward/workspace/hama-trunk/bin/../graph/target/classes:/home/edward/workspace/hama-trunk/bin/../hama-**.jar:/home/edward/workspace/hama-trunk/bin/../lib/ant-1.7.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/ant-launcher-1.7.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/avro-1.6.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/avro-ipc-1.6.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-cli-1.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-configuration-1.7.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-httpclient-3.0.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-lang-2.6.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-logging-1.1.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/commons-math3-3.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/guava-10.0.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/hadoop-core-1.0.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/hadoop-test-1.0.0.jar:/home/edward/workspace/hama-trunk/bin/../lib/jackson-core-asl-1.9.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/jackson-mapper-asl-1.9.2.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-annotations-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jetty-util-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jsp-2.1-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/jsp-api-2.1-6.1.14.jar:/home/edward/workspace/hama-trunk/bin/../lib/junit-4.8.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/log4j-1.2.16.jar:/home/edward/workspace/hama-trunk/bin/../lib/netty-3.2.6.Final.jar:/home/edward/workspace/hama-trunk/bin/../lib/servlet-api-6.0.32.jar:/home/edward/workspace/hama-trunk/bin/../lib/slf4j-api-1.5.8.jar:/home/edward/workspace/hama-trunk/bin/../lib/slf4j-log4j12-1.5.8.jar:/home/edward/workspace/hama-trunk/bin/../lib/snappy-java-1.0.4.1.jar:/home/edward/workspace/hama-trunk/bin/../lib/zookeeper-3.3.3.jar::/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work/classes:/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client
environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:java.io.tmpdir=/tmp
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:java.compiler=<NA>
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:os.name=Linux
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:os.arch=amd64
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:os.version=3.2.0-29-generic
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:user.name=edward
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client environment:user.home=/home/edward
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Client
environment:user.dir=/tmp/hadoop-edward/bsp/local/groomServer/attempt_201209111359_0002_000002_2/work
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO zookeeper.ZooKeeper:
Initiating client connection, connectString=edward-VirtualBox:21810
sessionTimeout=1200000
watcher=org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl@e33ad7
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO
zookeeper.ClientCnxn: Opening socket connection to server
edward-VirtualBox/127.0.1.1:21810
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO
zookeeper.ClientCnxn: Socket connection established to
edward-VirtualBox/127.0.1.1:21810, initiating session
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO sync.ZKSyncClient:
Initializing ZK Sync Client
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO
sync.ZooKeeperSyncClientImpl: Start connecting to Zookeeper! At
edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:07 INFO
zookeeper.ClientCnxn: Session establishment complete on server
edward-VirtualBox/127.0.1.1:21810, sessionid = 0x139b3b269e10013, negotiated
timeout = 1200000
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61003
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO
ipc.NettyTransceiver: [id: 0x15ee470d] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO
ipc.NettyTransceiver: [id: 0x15ee470d, /127.0.0.1:51614 =>
edward-VirtualBox/127.0.1.1:61003] BOUND: /127.0.0.1:51614
attempt_201209111359_0002_000002_2: 12/09/11 14:05:08 INFO
ipc.NettyTransceiver: [id: 0x15ee470d, /127.0.0.1:51614 =>
edward-VirtualBox/127.0.1.1:61003] CONNECTED: edward-VirtualBox/127.0.1.1:61003
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 => /127.0.1.1:61004] CONNECTED:
/127.0.0.1:50522
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 => /127.0.1.1:61004] CONNECTED:
/127.0.0.1:50523
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ft.AsyncRcvdMsgCheckpointImpl: Creating path
checkpoint/job_201209111359_0002/4/2
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 =>
edward-VirtualBox/127.0.1.1:61004] BOUND: /127.0.0.1:50523
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 =>
edward-VirtualBox/127.0.1.1:61004] CONNECTED: edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: Connecting to edward-VirtualBox/127.0.1.1:61002
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x4362b702] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] OPEN
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] BOUND: /127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 => /127.0.1.1:61004] CONNECTED:
/127.0.0.1:50524
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x4362b702, /127.0.0.1:55795 =>
edward-VirtualBox/127.0.1.1:61002] BOUND: /127.0.0.1:55795
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x4362b702, /127.0.0.1:55795 =>
edward-VirtualBox/127.0.1.1:61002] CONNECTED: edward-VirtualBox/127.0.1.1:61002
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO sync.ZKSyncClient:
Writing data /bsp/job_201209111359_0002/checkpoint/2
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ft.AsyncRcvdMsgCheckpointImpl: Enabled = true checkPointInterval = 1
lastCheckPointStep = 4 getSuperstepCount() = 4
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ft.AsyncRcvdMsgCheckpointImpl: checkpointNext = true checkpointMessageCount = 0
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 ERROR bsp.BSPTask: Error
running bsp setup and bsp function.
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated
to test by peer 2
attempt_201209111359_0002_000002_2: at
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO zookeeper.ZooKeeper:
Session: 0x139b3b269e10013 closed
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
zookeeper.ClientCnxn: EventThread shut down
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x73e3df54, /127.0.0.1:50522 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x385c0662, /127.0.0.1:50523 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :>
edward-VirtualBox/127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :>
edward-VirtualBox/127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: [id: 0x02190419, /127.0.0.1:50523 :>
edward-VirtualBox/127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: Remote peer edward-VirtualBox/127.0.1.1:61004 closed
connection.
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO
ipc.NettyTransceiver: Disconnecting from edward-VirtualBox/127.0.1.1:61004
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] DISCONNECTED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] UNBOUND
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 INFO ipc.NettyServer:
[id: 0x05dd06de, /127.0.0.1:50524 :> /127.0.1.1:61004] CLOSED
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 ERROR bsp.BSPTask:
Shutting down ping service.
attempt_201209111359_0002_000002_2: 12/09/11 14:05:09 FATAL bsp.GroomServer:
Error running child
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated
to test by peer 2
attempt_201209111359_0002_000002_2: at
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
attempt_201209111359_0002_000002_2: java.lang.RuntimeException: Error generated
to test by peer 2
attempt_201209111359_0002_000002_2: at
org.apache.hama.examples.RandBench$RandBSP.compute(RandBench.java:76)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.SuperstepBSP.bsp(SuperstepBSP.java:69)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
attempt_201209111359_0002_000002_2: at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
12/09/11 14:05:24 INFO bsp.BSPJobClient: Job failed.
{code}
Wanted to test your RandBench but, always fails (at final Exception). Am I
missed something?
I used TRUNK.
> Implement Checkpointing service in Hama
> ---------------------------------------
>
> Key: HAMA-557
> URL: https://issues.apache.org/jira/browse/HAMA-557
> Project: Hama
> Issue Type: Sub-task
> Components: bsp core
> Affects Versions: 0.6.0
> Reporter: Suraj Menon
> Assignee: Suraj Menon
> Fix For: 0.6.0
>
> Attachments: HAMA-505-557-610-611-v1.patch,
> HAMA-505-557-610-611-v2.patch, HAMA-557-ft-framework.patch
>
>
> Implement checkpointing service in Apache Hama. My patches for HAMA-533 and
> HAMA-534 are blocked on this.
> - Checkpointing should be done as messages are either sent or received. I
> prefer while receiving messages, as we can achieve some parallelism with
> asynchronous messages. Please comment if you differ.
> - BSPMaster should hold the checkpoint status for each task. Checkpoint
> status includes superstep count and file information for which checkpointing
> is complete
> - MessageManager should notify Checkpointer of a new message at BSPPeer.
> - Implement/Reuse MessageBundle class as splitClass in BSPPeerImpl for
> recovery in initInput.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira