[
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037788#comment-13037788
]
ChiaHung Lin commented on HAMA-387:
-----------------------------------
Does the cnode14 eventually enters the 98th superstep? From the log, it seems
like cnode14 is going to enter the 98th superstep (but not yet log
information). My understanding is that barrier synchronization would wait all
processes reach the barrier then proceed. Therefore, if cnode14 log `enter the
98 barrier' later on, all nodes then leave barrier; such result looks ok.
Also, a quick look at the patch shows that the creation of znode is EPHEMERAL
instead of EPHEMERAL_SEQUENTIAL; this eliminates the issues that clients
process disconnects and then reconnect scenario that leads to the name appended
with a monotonically increasing number.
> Add task ID and superstep count informations to lock file
> ---------------------------------------------------------
>
> Key: HAMA-387
> URL: https://issues.apache.org/jira/browse/HAMA-387
> Project: Hama
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Edward J. Yoon
> Fix For: 0.3.0
>
> Attachments: sleepless.patch
>
>
> I think, the lock file must include:
> * the job ID
> * the task ID of the lock file owner
> * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one
> groomserver in the future.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira