[
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477617#comment-13477617
]
Alejandro Abdelnur commented on MAPREDUCE-4495:
-----------------------------------------------
Bobby, thanks for taking the time to go over the doc/code, following some
answers/comments to your feedback.
On *How will the WFAM handle itself crashing…*: an important part of the
design/implementation is that all WF state changes can stored in consistent way
after every signal call to the WF. This enables dumping the updated state to a
file in the form of an edit log; this file can be in HDFS. On restart the WFAM
would read the files from HDFS and reconstruct its state (this is not
implemented yet, but it is quite similar on how we do things today in Oozie
using the DB).
On *If the WFAM does restart after a crash will it try to reestablish
communication with App Masters*, the WFAM would be able to reconnect with
children AMs without any issue as they would have continued working without
knowing that the parent WFAM got restarted, it would use just their async
client APIs.
On *How will the WFAM schedule containers*, the original idea is to do a
passthrough to the RM, later this proxy may become more sophisticated and have
an heuristic to reuse containers when it makes sense. This would be possible
when the WFAM is using embedded AMs (ie an MRAM to run an MR job) and the
embedded AMs support injection of Container implementations (ie to replace the
default container allocation)
On *How do you decided which AM etc has a higher priority…*, we are constrained
by a DAG, thus current DAG nodes get to run before upcoming ones.
On *How security going to be handled?*, no different from how is handled in
MRAM.
On *I'm also curious about how you would see us getting to what I was talking
about previously..*, agree with the direction/approach you are proposing in
your previous comment. Furthermore, I'm currently prototyping, after Arun's
suggestion, a JobControl subclass that converts the job dependency tree into a
workflow XML (which it would be then executed by the WFAM or submitted to
Oozie). If the prototype works as expected I'm planning to open a JIRA
introducing a JobControlFactory which would return the current JobControl as
default but via a configuration property could instead return a WFJobControl
implementation based on the prototype I've just described. Then changing Pig to
use the JobControlFactory to create the JobControl instead a constructor would
give the flexibility of executing Pig in an WFAM or in a current version of
Oozie.
Finally, on your comment on dynamic workflow generation, it is definitely
doable today using the WorkflowLib API directly, I could put together a example
early next week.
> Workflow Application Master in YARN
> -----------------------------------
>
> Key: MAPREDUCE-4495
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.0-alpha
> Reporter: Bo Wang
> Assignee: Bo Wang
> Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch,
> MapReduceWorkflowAM.pdf, yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of
> running a DAG of jobs. The workflow client submits a DAG request to the AM
> and then the AM will manage the life cycle of this application in terms of
> requesting the needed resources from the RM, and starting, monitoring and
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master,
> these are some of the advantages:
> - Less number of consumed resources, since only one application master will
> be spawned for the whole workflow.
> - Reuse of resources, since the same resources can be used by multiple
> consecutive jobs in the workflow (no need to request/wait for resources for
> every individual job from the central RM).
> - More optimization opportunities in terms of collective resource requests.
> - Optimization opportunities in terms of rewriting and composing jobs in the
> workflow (e.g. pushing down Mappers).
> - This Application Master can be reused/extended by higher systems like Pig
> and hive to provide an optimized way of running their workflows.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira