[
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051131#comment-13051131
]
Thomas Jungblut commented on HAMA-387:
--------------------------------------
Hmm crap.
Can we add a testcase, this should be easily reproducable?
And what if we prevent peers from entering the barrier if the zookeeper lock
still exists?
For example like this:
{noformat}
protected boolean enterBarrier() throws KeeperException, InterruptedException {
LOG.debug("[" + getPeerName() + "] enter the enterbarrier");
try {
while (zk.exists(bspRoot + "/" + getPeerName(), false) != null) {
Thread.sleep(500L);
}
zk.create(bspRoot + "/" + getPeerName(),
Bytes.toBytes(this.getSuperstepCount()), Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL);
} catch (KeeperException e) {
LOG.error("Exception while entering barrier!", e);
} catch (InterruptedException e) {
LOG.error("Exception while entering barrier!", e);
}
// etc omitted ...
{noformat}
> Add task ID and superstep count informations to lock file
> ---------------------------------------------------------
>
> Key: HAMA-387
> URL: https://issues.apache.org/jira/browse/HAMA-387
> Project: Hama
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.2.0
> Reporter: Edward J. Yoon
> Fix For: 0.3.0
>
> Attachments: HAMA-387_v02.patch, sleepless.patch
>
>
> I think, the lock file must include:
> * the job ID
> * the task ID of the lock file owner
> * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one
> groomserver in the future.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira