[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13428429#comment-13428429
 ] 

eric baldeschwieler commented on MAPREDUCE-4495:
------------------------------------------------

Agree with discussing a particular proposal.

I want to point out that the whole point of YARN is to open up the ability to 
try lots of different changes to MR and to implement lots of alternatives to it 
in parallel.  As a community, we need to be clear that to move fast we need to 
let lots of different people try lots of different things on top of a stable 
platform.  Pig and Hive folks want to radically change what MR is.  There are 
lots of different ideas for how to do this. 

With open APIs everyone is empowered to try new things without asking to get 
their code into the core project.  If we don't embrace the principle of new AMs 
starting outside the core, we are going to have a huge number of arguments like 
this without making anyone happy.  That's not the best way for us to spend our 
time.  I'm not trying to stop anyone from trying anything, I'm trying to reduce 
friction.

My last point is the overhead argument.  Arguing that one doesn't want to go to 
incubator because that adds cost to your project really doesn't look at the 
whole picture.  Adding a new module or sub-project to an existing Apache 
project creates as much work as doing it in the incubator.  It just tosses that 
work into the lap of the folks maintaining the existing project.  When one 
talks about Apache being about community before code, that doesn't mean one has 
a right to do anything in the code.  One needs to first build consensus that 
your coding idea is aligned with the community.  Any time you add something to 
a project, you are implicitly asking the others in the community to do a lot of 
work to support you.  That only makes sense if you are working in a direction 
that the community sees as aligned with the larger goals of the project.

Going full circle, Yarn's open APIs have as a goal allowing people to try a lot 
more things much less expensively.  They don't need to get permission to merge 
their work into MR, which is good for experimenters.  Hadoop committers are not 
burdened with vetting and support many different experiments in Hadoop.  The 
experimenters carry the burden of building community and supporting / selling 
their ideas.  This should save us a lot of time arguing on this list!  ;-)


                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to