[
https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113900#comment-13113900
]
Thomas Jungblut edited comment on HAMA-387 at 9/24/11 6:00 AM:
---------------------------------------------------------------
Well, in our bsppeer code the enter and leave barrier methods are just two RPC
calls. This is cleaner than the whole sync and notify of ZK Nodes.
In addition we have our own sync service, which can now keep track of Superstep
and additional information if we want to keep it there. For example which tasks
are currently within the barrier. So we don't need zookeeper at all.
And we possibly could de-register task, so we can adjust the number of tasks
that are need to trip the barrier during runtime. So we could add another
method which is some kind of waitToHalt(), which deregisters the task from the
sync service.
Besides that, I think this is faster than ZK barrier sync.
So to summarize, we would have full control, it is our code, no dependency. It
is cleaner and we can implement new features easier with it.
And I guess I take the sync code for the MR NG integration, just because it is
its own service and I don't want to debug the BSPPeer barrier code.
was (Author: thomas.jungblut):
Well, in our groom code the enter and leave barrier methods are just two
RPC calls. This is cleaner than the whole sync and notify of ZK Nodes.
In addition we have our own sync service, which can now keep track of Superstep
and additional information if we want to keep it there. For example which tasks
are currently within the barrier. So we don't need zookeeper at all.
And we possibly could de-register task, so we can adjust the number of tasks
that are need to trip the barrier during runtime. So we could add another
method which is some kind of waitToHalt(), which deregisters the task from the
sync service.
Besides that, I think this is faster than ZK barrier sync.
So to summarize, we would have full control, it is our code, no dependency. It
is cleaner and we can implement new features easier with it.
And I guess I take the sync code for the MR NG integration, just because it is
its own service and I don't want to debug the BSPPeer barrier code.
> Advanced Barrier Synchronization
> --------------------------------
>
> Key: HAMA-387
> URL: https://issues.apache.org/jira/browse/HAMA-387
> Project: Hama
> Issue Type: Improvement
> Components: bsp
> Affects Versions: 0.3.0
> Reporter: Edward J. Yoon
> Assignee: ChiaHung Lin
> Fix For: 0.4.0
>
> Attachments: HAMA-387.patch, HAMA-387_v02.patch, HAMA-387_v03.patch,
> HAMA-387_v04.patch, doublebarrier.patch, new.patch, ownSyncService.patch,
> ownSyncService_v2.patch, ownSyncService_v3.patch, sleepless.patch, x.PNG,
> x.patch
>
>
> I think, the lock file must include:
> * the job ID
> * the task ID of the lock file owner
> * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one
> groomserver in the future.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira