[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473513#comment-13473513
 ] 

Robert Joseph Evans commented on MAPREDUCE-4495:
------------------------------------------------

I really do like the idea of having an AM that can run a workflow.  I think 
that there is a huge potential here and I want to see this move forward, but 
the size and scope of this change is a lot to take in. There are 11,734 lines 
in the patch.  I realize that a lot of this was taken from Oozie itself, but 
then how are we going to keep the two in sync?  What happens when Oozie finds a 
bug?  How are we going to be sure that the bug is pulled into mapred?  I really 
would prefer to see a more agile approach to these changes, and hopefully some 
of them can correspond to MR, YARN, and HDFS splitting apart after 2.0 has 
stabilized, so Arun's fears about Hadoop returning to be a project of projects 
can be alleviated.

Can we look at moving the parts that can be common between Oozie and the 
workflow AM into a separate project? That project I would expect to eventually 
own the complete Workflow AM, but in the short term it would just provide a 
place for this workflow library.  In parallel with that we can move forward and 
put in a simple AM that allows for the existing JobControl API to run in an AM. 
 This would allow us to validate that the MR AM is thread safe, and keep it 
that way.  It would also offer a potentially huge benefit to pig which does use 
that API currently.  I would expect most of the initial code for this 
JobControl workflow AM to be replaced as it moves to use the common workflow 
library.  

By doing this in an agile fashion it would also allow us to work out a number 
of potential issues I see when moving this from Oozie which uses a DB to store 
its state to a workflow AM where that is not possible.  By doing an initial 
simple JobControl AM we can work out some of the issues with restarting the AM 
after it crashes.  What is more by keeping the changes small, it is much more 
likely to be something that can be merged into branch 2 so that the branches do 
not diverge nearly as much.
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
> MapReduceWorkflowAM.pdf
>
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to