[ 
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107570#comment-13107570
 ] 

Edward J. Yoon commented on HAMA-387:
-------------------------------------

Job hangs again in the patch test.

{code}
root@Cnode1:/usr/local/src/hama-trunk# core/bin/hama jar 
examples/target/hama-exampleSNAPSHOT.jar bench 160 10000 64
11/09/19 09:34:31 DEBUG bsp.BSPJobClient: BSPJobClient.submitJobDir: 
hdfs://hnode15:9/bsp/system/submit_z5c7vt
11/09/19 09:34:31 INFO bsp.BSPJobClient: Running job: job_201109190912_0005
11/09/19 09:34:34 INFO bsp.BSPJobClient: Current supersteps number: 0
11/09/19 09:34:40 INFO bsp.BSPJobClient: Current supersteps number: 1
11/09/19 09:34:43 INFO bsp.BSPJobClient: Current supersteps number: 3
11/09/19 09:34:46 INFO bsp.BSPJobClient: Current supersteps number: 5
11/09/19 09:34:49 INFO bsp.BSPJobClient: Current supersteps number: 6
11/09/19 09:34:52 INFO bsp.BSPJobClient: Current supersteps number: 8
11/09/19 09:34:55 INFO bsp.BSPJobClient: Current supersteps number: 10
11/09/19 09:34:58 INFO bsp.BSPJobClient: Current supersteps number: 12
11/09/19 09:35:01 INFO bsp.BSPJobClient: Current supersteps number: 13
11/09/19 09:35:04 INFO bsp.BSPJobClient: Current supersteps number: 14

----

2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():45 children in the 
list:[attempt_201109190912_0005_000020_0, attempt_201109190912_0005_000005_0, 
attempt_201109190912_0005_000030_0, attempt_201109190912_0005_000021_0, 
attempt_201109190912_0005_000023_0, attempt_201109190912_0005_000004_0, 
attempt_201109190912_0005_000010_0, attempt_201109190912_0005_000014_0, 
attempt_201109190912_0005_000015_0, attempt_201109190912_0005_000039_0, 
attempt_201109190912_0005_000006_0, attempt_201109190912_0005_000007_0, 
attempt_201109190912_0005_000019_0, attempt_201109190912_0005_000044_0, 
attempt_201109190912_0005_000024_0, attempt_201109190912_0005_000013_0, 
attempt_201109190912_0005_000025_0, attempt_201109190912_0005_000016_0, 
attempt_201109190912_0005_000034_0, attempt_201109190912_0005_000042_0, 
attempt_201109190912_0005_000026_0, attempt_201109190912_0005_000035_0, 
attempt_201109190912_0005_000008_0, attempt_201109190912_0005_000018_0, 
attempt_201109190912_0005_000033_0, attempt_201109190912_0005_000009_0, 
attempt_201109190912_0005_000002_0, attempt_201109190912_0005_000041_0, 
attempt_201109190912_0005_000036_0, attempt_201109190912_0005_000012_0, 
attempt_201109190912_0005_000003_0, attempt_201109190912_0005_000011_0, 
attempt_201109190912_0005_000038_0, attempt_201109190912_0005_000029_0, 
attempt_201109190912_0005_000028_0, attempt_201109190912_0005_000040_0, 
attempt_201109190912_0005_000017_0, attempt_201109190912_0005_000043_0, 
attempt_201109190912_0005_000027_0, attempt_201109190912_0005_000000_0, 
attempt_201109190912_0005_000001_0, attempt_201109190912_0005_000031_0, 
attempt_201109190912_0005_000037_0, attempt_201109190912_0005_000022_0, 
attempt_201109190912_0005_000032_0]
2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: =====> 
jobid:job_201109190912_0005 taskid:attempt_201109190912_0005_000005_0 after 
enterBarrier()
2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000003_0 11/09/19 09:35:07 INFO bsp.BSPPeer: =====> 
jobid:job_201109190912_0005 taskid:attempt_201109190912_0005_000003_0 after 
enterBarrier()
2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: =====> 
jobid:job_201109190912_0005 taskid:attempt_201109190912_0005_000005_0 before 
leaveBarrier()
2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:11 children in the 
list[attempt_201109190912_0005_000007_0, attempt_201109190912_0005_000044_0, 
attempt_201109190912_0005_000018_0, attempt_201109190912_0005_000009_0, 
attempt_201109190912_0005_000041_0, attempt_201109190912_0005_000003_0, 
attempt_201109190912_0005_000011_0, attempt_201109190912_0005_000028_0, 
attempt_201109190912_0005_000027_0, attempt_201109190912_0005_000000_0, 
attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,480 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000001_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():11 children in the 
list:[attempt_201109190912_0005_000007_0, attempt_201109190912_0005_000044_0, 
attempt_201109190912_0005_000018_0, attempt_201109190912_0005_000009_0, 
attempt_201109190912_0005_000041_0, attempt_201109190912_0005_000003_0, 
attempt_201109190912_0005_000011_0, attempt_201109190912_0005_000028_0, 
attempt_201109190912_0005_000027_0, attempt_201109190912_0005_000000_0, 
attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,617 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000003_0 11/09/19 09:35:07 INFO bsp.BSPPeer: =====> 
jobid:job_201109190912_0005 taskid:attempt_201109190912_0005_000003_0 before 
leaveBarrier()
2011-09-19 09:35:07,661 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000003_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:3 children in the 
list[attempt_201109190912_0005_000028_0, attempt_201109190912_0005_000027_0, 
attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,661 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000001_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():3 children in the 
list:[attempt_201109190912_0005_000028_0, attempt_201109190912_0005_000027_0, 
attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,661 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:3 children in the 
list[attempt_201109190912_0005_000028_0, attempt_201109190912_0005_000027_0, 
attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,836 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000003_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:1 children in the 
list[attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,836 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000001_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxx 
enterBarrier() list.size():1 children in the 
list:[attempt_201109190912_0005_000001_0]
2011-09-19 09:35:07,836 INFO org.apache.hama.bsp.TaskRunner: 
attempt_201109190912_0005_000005_0 11/09/19 09:35:07 INFO bsp.BSPPeer: xxxxx 
leaveBarrier() list.size:1 children in the 
list[attempt_201109190912_0005_000001_0]
{code}

> Advanced Barrier Synchronization
> --------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-387_v02.patch, HAMA-387_v03.patch, 
> HAMA-387_v04.patch, new.patch, sleepless.patch, x.PNG, x.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one 
> groomserver in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to