[ 
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103311#comment-13103311
 ] 

ChiaHung Lin commented on HAMA-387:
-----------------------------------

I see there are two issues here. 

First, sync() may hang - from the information I received, the problem seemingly 
comes from the superstep as we discussed last time. Have we tested this 
already? Or any more detail information (e.g steps to reproduce this problem) 
so others can help test if adding superstep would fix the problem.

Second, long running process - it seems to me this issue is more related to 
performance issue (not showstopper.) It probably can be improved by making use 
of message tree[1] or scheduling tasks with roughly equal computation load. 

Personally I think the first problem is more important and we should fix it 
first. 

[1] Practical Barrier Synchronisation. 
ftp://ftp.comlab.ox.ac.uk/pub/Documents/techpapers/Jonathan.Hill/HillSkill_barrier.ps.Z


> Advanced Barrier Synchronization
> --------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.4.0
>
>         Attachments: HAMA-387_v02.patch, HAMA-387_v03.patch, 
> HAMA-387_v04.patch, new.patch, sleepless.patch, x.PNG
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one 
> groomserver in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to