[ 
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037788#comment-13037788
 ] 

ChiaHung Lin commented on HAMA-387:
-----------------------------------

Does the cnode14 eventually enters the 98th superstep? From the log, it seems 
like cnode14 is going to enter the 98th superstep (but not yet log 
information). My understanding is that barrier synchronization would wait all 
processes reach the barrier then proceed. Therefore, if cnode14 log `enter the 
98 barrier' later on, all nodes then leave barrier; such result looks ok. 

Also, a quick look at the patch shows that the creation of znode is EPHEMERAL 
instead of EPHEMERAL_SEQUENTIAL; this eliminates the issues that clients 
process disconnects and then reconnect scenario that leads to the name appended 
with a monotonically increasing number.   


> Add task ID and superstep count informations to lock file
> ---------------------------------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Edward J. Yoon
>             Fix For: 0.3.0
>
>         Attachments: sleepless.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one 
> groomserver in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to