[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477098#comment-13477098
 ] 

Robert Joseph Evans commented on MAPREDUCE-4495:
------------------------------------------------

I am happy to comment just on the design as well.  I brought up where code 
should be located because I thought it would have a significant impact on 
different phases of the design.  I have a few questions about just the design 
too. 

How will the WfAM handle itself crashing, having the container it is running on 
pulled out from under it, having one if its child AMs crashing, or having those 
containers pulled out from under them, this is particularly for the V2 and V3 
architectures where the AMs are not launched by the RM, but as simple 
containers, or as threads?
Will its state be stored somewhere so it can restart where it left off?
If the WfAM does restart after a crash will it try to reestablish communication 
with App Masters running in other containers, will it kill them, or will it 
just abandon them like MR currently does?

How will the WfAM schedule containers?  The MR AM is very simple, it does not 
need anything complex, just find a free map task if there is a map container 
available, pick one closest to the data, or pick a reduce task if a reduce task 
is available.  How do you decided which AM etc has a higher priority or is it 
always FIFO?

For container reuse on MR the localized resources, the config, the JVM, and 
even the output do not really need to change between tasks, because they are 
all doing essentially the same things.  Containers for map tasks never become 
reduce tasks or vise versa.  How will the WfAM handle this? Different jobs are 
likely to have different classpaths, distributed cache setups, different 
command lines, etc. even if they all have the same resource request.

What type of changes are going to be required for a preexisting AM to be able 
to communicate with the LocalResourceCoordinator?  Ideally you could provide a 
simple shim layer that would allow the App Master to either communicate 
directly with the RM or go through the WfAM instead.  This way there really 
becomes no change for an AppMaster that wants to work with both.

How is security going to be handled?

I am also curious about how you would see us getting to what I was talking 
[about 
previously|https://issues.apache.org/jira/browse/MAPREDUCE-4495?focusedCommentId=13427387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13427387],
 and do we want to go there sooner rather then later, or in your opinion it is 
something we should not do at all?  Building the DAG and doing static 
optimizations seems very doable with the current design by inserting in a phase 
after parsing the DAG and before beginning execution of it.  But the ones that 
are more interesting to me, the ones where the DAG is much more dynamic look 
like they would require some significant rework to the design.
                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
> MapReduceWorkflowAM.pdf, yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to