[
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473513#comment-13473513
]
Robert Joseph Evans commented on MAPREDUCE-4495:
------------------------------------------------
I really do like the idea of having an AM that can run a workflow. I think
that there is a huge potential here and I want to see this move forward, but
the size and scope of this change is a lot to take in. There are 11,734 lines
in the patch. I realize that a lot of this was taken from Oozie itself, but
then how are we going to keep the two in sync? What happens when Oozie finds a
bug? How are we going to be sure that the bug is pulled into mapred? I really
would prefer to see a more agile approach to these changes, and hopefully some
of them can correspond to MR, YARN, and HDFS splitting apart after 2.0 has
stabilized, so Arun's fears about Hadoop returning to be a project of projects
can be alleviated.
Can we look at moving the parts that can be common between Oozie and the
workflow AM into a separate project? That project I would expect to eventually
own the complete Workflow AM, but in the short term it would just provide a
place for this workflow library. In parallel with that we can move forward and
put in a simple AM that allows for the existing JobControl API to run in an AM.
This would allow us to validate that the MR AM is thread safe, and keep it
that way. It would also offer a potentially huge benefit to pig which does use
that API currently. I would expect most of the initial code for this
JobControl workflow AM to be replaced as it moves to use the common workflow
library.
By doing this in an agile fashion it would also allow us to work out a number
of potential issues I see when moving this from Oozie which uses a DB to store
its state to a workflow AM where that is not possible. By doing an initial
simple JobControl AM we can work out some of the issues with restarting the AM
after it crashes. What is more by keeping the changes small, it is much more
likely to be something that can be merged into branch 2 so that the branches do
not diverge nearly as much.
> Workflow Application Master in YARN
> -----------------------------------
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.0-alpha
> Reporter: Bo Wang
> Assignee: Bo Wang
> Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch,
> MapReduceWorkflowAM.pdf
>
>
> It is useful to have a workflow application master, which will be capable of
> running a DAG of jobs. The workflow client submits a DAG request to the AM
> and then the AM will manage the life cycle of this application in terms of
> requesting the needed resources from the RM, and starting, monitoring and
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master,
> these are some of the advantages:
> - Less number of consumed resources, since only one application master will
> be spawned for the whole workflow.
> - Reuse of resources, since the same resources can be used by multiple
> consecutive jobs in the workflow (no need to request/wait for resources for
> every individual job from the central RM).
> - More optimization opportunities in terms of collective resource requests.
> - Optimization opportunities in terms of rewriting and composing jobs in the
> workflow (e.g. pushing down Mappers).
> - This Application Master can be reused/extended by higher systems like Pig
> and hive to provide an optimized way of running their workflows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira