[
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477098#comment-13477098
]
Robert Joseph Evans commented on MAPREDUCE-4495:
------------------------------------------------
I am happy to comment just on the design as well. I brought up where code
should be located because I thought it would have a significant impact on
different phases of the design. I have a few questions about just the design
too.
How will the WfAM handle itself crashing, having the container it is running on
pulled out from under it, having one if its child AMs crashing, or having those
containers pulled out from under them, this is particularly for the V2 and V3
architectures where the AMs are not launched by the RM, but as simple
containers, or as threads?
Will its state be stored somewhere so it can restart where it left off?
If the WfAM does restart after a crash will it try to reestablish communication
with App Masters running in other containers, will it kill them, or will it
just abandon them like MR currently does?
How will the WfAM schedule containers? The MR AM is very simple, it does not
need anything complex, just find a free map task if there is a map container
available, pick one closest to the data, or pick a reduce task if a reduce task
is available. How do you decided which AM etc has a higher priority or is it
always FIFO?
For container reuse on MR the localized resources, the config, the JVM, and
even the output do not really need to change between tasks, because they are
all doing essentially the same things. Containers for map tasks never become
reduce tasks or vise versa. How will the WfAM handle this? Different jobs are
likely to have different classpaths, distributed cache setups, different
command lines, etc. even if they all have the same resource request.
What type of changes are going to be required for a preexisting AM to be able
to communicate with the LocalResourceCoordinator? Ideally you could provide a
simple shim layer that would allow the App Master to either communicate
directly with the RM or go through the WfAM instead. This way there really
becomes no change for an AppMaster that wants to work with both.
How is security going to be handled?
I am also curious about how you would see us getting to what I was talking
[about
previously|https://issues.apache.org/jira/browse/MAPREDUCE-4495?focusedCommentId=13427387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13427387],
and do we want to go there sooner rather then later, or in your opinion it is
something we should not do at all? Building the DAG and doing static
optimizations seems very doable with the current design by inserting in a phase
after parsing the DAG and before beginning execution of it. But the ones that
are more interesting to me, the ones where the DAG is much more dynamic look
like they would require some significant rework to the design.
> Workflow Application Master in YARN
> -----------------------------------
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.0-alpha
> Reporter: Bo Wang
> Assignee: Bo Wang
> Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch,
> MapReduceWorkflowAM.pdf, yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of
> running a DAG of jobs. The workflow client submits a DAG request to the AM
> and then the AM will manage the life cycle of this application in terms of
> requesting the needed resources from the RM, and starting, monitoring and
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master,
> these are some of the advantages:
> - Less number of consumed resources, since only one application master will
> be spawned for the whole workflow.
> - Reuse of resources, since the same resources can be used by multiple
> consecutive jobs in the workflow (no need to request/wait for resources for
> every individual job from the central RM).
> - More optimization opportunities in terms of collective resource requests.
> - Optimization opportunities in terms of rewriting and composing jobs in the
> workflow (e.g. pushing down Mappers).
> - This Application Master can be reused/extended by higher systems like Pig
> and hive to provide an optimized way of running their workflows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira