[
https://issues.apache.org/jira/browse/HAMA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102263#comment-13102263
]
Vinod Kumar Vavilapalli commented on HAMA-431:
----------------------------------------------
Long comment, the leisure of the weekend :)
Good to see the ball rolling.
I had a browsing session on the current HAMA code(let's call this HamaV1 code)
and the mapreduce-integration branch (actually this should be Yarn-integration,
let's call this HamaV2).
Some thoughts follow. Some of the following may be naive as I am new around
here :)
*Regarding the Job and Task state machines*: Yes it does look like you don't
need a lot of states and their corresponding transitions here, from what I can
see from HamaV1 _JobInProgress_ and _TaskInProgress_. Is that because you don't
have good failure handling in HamaV1 (as I read in of the presentations)? It
that isn't true, ignore what follows. Otherwise, I think it is the right time
to think about fault tolerance (if at all) and write down the state machines to
include the faulty scenarios.
*Implementation of barrier synchronization*: Not sure of the problems you ran
with ZooKeeper in HamaV1, but can't we use the _ApplicationMaster_(AM) in
HamaV2 as a barrier synchronization service? Each {{BSPPeer}} could
periodically poll the AM if it can proceed to the next superstep. If and when
the AM goes down, all the BSPPeers just wait there spinning till AM is
restarted by the Yarn _ResourceManager_.
-- Pros: Avoiding ZooKeeper frees BSP from the ZK external dependency, one
less service needed for running HAMA apps.
-- Cons: It robs HAMA of the the notification push vis ZK's watcher
mechanism (notification push vs periodic pull) (This should be agreeable, no?).
Thoughts?
*Regarding use of MR classes*:
- _Reuse of MRV2 classes_: I was appalled by the amount of Hadoop MapReduce
code (kinda) forked in HamaV1. Glad that with Yarn and HamaV2, most of the
forking will be gone. Still, one look at the HamaV2 code you have at Google
Code tells me you are trying to mimic MRV2 (MapReduce over YARN) internals.
IMO, that isn't needed as the Job, Task, TaskAttempt etc in MR have concepts
specific to MapReduce like Map/Reduce tasks. I think we can redesign these
objects needed for HAMA here relatively with far more ease. And that's cleaner
too.
- _Code reuse from MRV2_: OTOH, I do clearly see that we should re-use MRV2
components like ContainerLauncher (launches containers on nodes),
RMContainerAllocator(requests containers from ResourceManager), I'll see how we
can move these to a separate common library module from MRV2 so that Hama(and
possibly others) can use them.
*Meta comment*: Instead of jumping into writing the implementation, I think it
helps to spend some time developing the design till it reaches some level of
stability and then writing down the module structure(like BspAppMaster module,
BspChild module etc.), followed by the interfaces of all the data objects and
the components and finally wiring them together. Once we have all the
interfaces and communication patterns in place, implementation can be done in
parallel. It did help us writing MRV2 a lot cleaner, am sure it will help us
here too.
*General infra thought*: I think having this branch at apache svn helps HAMA's
incubation status. Also it will be easy for anyone else from the current
hama-dev interested in working on this to use apache lists, svn etc. (Oh, BTW,
I am looking for collaborating too :) ). What do you think?
> MapReduce NG integration
> ------------------------
>
> Key: HAMA-431
> URL: https://issues.apache.org/jira/browse/HAMA-431
> Project: Hama
> Issue Type: New Feature
> Reporter: Thomas Jungblut
> Assignee: Thomas Jungblut
>
> We should take a look at how to integrate Hama's BSP Engine to Hadoop's
> nextGen application platform.
> Can be currently found in the 0.23 branch.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira