[
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104496#comment-13104496
]
Thomas Jungblut commented on HAMA-387:
--------------------------------------
I see several exceptions, but everything runs fine. (Ubuntu x64 in pseudo
distributed mode with 3 tasks).
For superstep 3
{noformat}
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /bsp/job_201109141522_0001/3
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:80)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000000_0 at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 11/09/14 15:26:18 WARN bsp.BSPPeer: Ignore
for JobID/superstepcount znode is created.
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /bsp/job_201109141522_0001/3
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:80)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000001_0 at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-14 15:26:18,578 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0001_000002_0 11/09/14 15:26:18 INFO bsp.BSPPeer: xxxx
enterBarrier() list.size():2 children in the
list:[attempt_201109141522_0001_000000_0, attempt_20110
{noformat}
for superstep 999
{noformat}
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before
enterBarrier()
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx
enterBarrier() list.size():2 children in the
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0]
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx
enterBarrier() list.size():3 children in the
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0,
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 after
enterBarrier()
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx
enterBarrier() list.size():3 children in the
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0,
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 after
enterBarrier()
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx
enterBarrier() list.size():3 children in the
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0,
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 after
enterBarrier()
2011-09-14 15:34:58,857 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before
leaveBarrier()
2011-09-14 15:34:58,858 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 before
leaveBarrier()
2011-09-14 15:34:58,859 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx
leaveBarrier() list.size:1 children in the
list[attempt_201109141522_0002_000002_0]
2011-09-14 15:34:58,859 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx
leaveBarrier() list.size:1 children in the
list[attempt_201109141522_0002_000002_0]
2011-09-14 15:34:58,894 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 before
leaveBarrier()
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 after
leaveBarrier()
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,896 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 after
leaveBarrier()
2011-09-14 15:34:58,896 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 after
leaveBarrier()
{noformat}
and
{noformat}
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:59 INFO bsp.BSPPeer: =====>
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before
enterBarrier()
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 11/09/14 15:34:59 WARN bsp.BSPPeer: Ignore
for JobID/superstepcount znode is created.
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode =
NodeExists for /bsp/job_201109141522_0002/999
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.hama.examples.RandBench$RandBSP.bsp(RandBench.java:67)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner:
attempt_201109141522_0002_000000_0 at
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
{noformat}
Is this helpful for you?
> Advanced Barrier Synchronization
> --------------------------------
>
> Key: HAMA-387
> URL: https://issues.apache.org/jira/browse/HAMA-387
> Project: Hama
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.3.0
> Reporter: Edward J. Yoon
> Assignee: Edward J. Yoon
> Fix For: 0.4.0
>
> Attachments: HAMA-387_v02.patch, HAMA-387_v03.patch,
> HAMA-387_v04.patch, new.patch, sleepless.patch, x.PNG, x.patch
>
>
> I think, the lock file must include:
> * the job ID
> * the task ID of the lock file owner
> * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one
> groomserver in the future.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira