[ 
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104496#comment-13104496
 ] 

Thomas Jungblut commented on HAMA-387:
--------------------------------------

I see several exceptions, but everything runs fine. (Ubuntu x64 in pseudo 
distributed mode with 3 tasks).

For superstep 3

{noformat} 
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /bsp/job_201109141522_0001/3
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:26:18,573 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:80)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:26:18,574 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000000_0         at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0 11/09/14 15:26:18 WARN bsp.BSPPeer: Ignore 
for JobID/superstepcount znode is created.
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /bsp/job_201109141522_0001/3
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:26:18,575 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:80)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:26:18,576 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000001_0         at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-14 15:26:18,578 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0001_000002_0 11/09/14 15:26:18 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():2 children in the 
list:[attempt_201109141522_0001_000000_0, attempt_20110
{noformat}

for superstep 999

{noformat} 
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before 
enterBarrier() 
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():2 children in the 
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0]
2011-09-14 15:34:58,655 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():3 children in the 
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0, 
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 after 
enterBarrier() 
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():3 children in the 
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0, 
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 after 
enterBarrier() 
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():3 children in the 
list:[attempt_201109141522_0002_000002_0, attempt_201109141522_0002_000000_0, 
attempt_201109141522_0002_000001_0]
2011-09-14 15:34:58,656 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 after 
enterBarrier() 
2011-09-14 15:34:58,857 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before 
leaveBarrier() 
2011-09-14 15:34:58,858 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 before 
leaveBarrier() 
2011-09-14 15:34:58,859 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:1 children in the 
list[attempt_201109141522_0002_000002_0]
2011-09-14 15:34:58,859 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:1 children in the 
list[attempt_201109141522_0002_000002_0]
2011-09-14 15:34:58,894 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 before 
leaveBarrier() 
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000002_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000002_0 after 
leaveBarrier() 
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,895 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:0 children in the list[]
2011-09-14 15:34:58,896 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 after 
leaveBarrier() 
2011-09-14 15:34:58,896 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:58 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000001_0 after 
leaveBarrier() 

{noformat}

and

{noformat}
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:59 INFO bsp.BSPPeer: =====> 
jobid:job_201109141522_0002 taskid:attempt_201109141522_0002_000000_0 before 
enterBarrier() 
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 11/09/14 15:34:59 WARN bsp.BSPPeer: Ignore 
for JobID/superstepcount znode is created.
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0 
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = 
NodeExists for /bsp/job_201109141522_0002/999
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-14 15:34:59,101 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:394)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:309)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.hama.examples.RandBench$RandBSP.bsp(RandBench.java:67)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000001_0 11/09/14 15:34:59 INFO 
examples.RandBench$RandBSP: ubuntu.ubuntu-domain:61001 to 
ubuntu.ubuntu-domain:61001 : 512
2011-09-14 15:34:59,102 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109141522_0002_000000_0         at 
org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
{noformat}

Is this helpful for you?

> Advanced Barrier Synchronization
> --------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-387_v02.patch, HAMA-387_v03.patch, 
> HAMA-387_v04.patch, new.patch, sleepless.patch, x.PNG, x.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one 
> groomserver in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to