[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477617#comment-13477617
 ] 

Alejandro Abdelnur commented on MAPREDUCE-4495:
-----------------------------------------------

Bobby, thanks for taking the time to go over the doc/code, following some 
answers/comments to your feedback.

On *How will the WFAM handle itself crashing…*: an important part of the 
design/implementation is that all WF state changes can stored in consistent way 
after every signal call to the WF. This enables dumping the updated state to a 
file in the form of an edit log; this file can be in HDFS. On restart the WFAM 
would read the files from HDFS and reconstruct its state (this is not 
implemented yet, but it is quite similar on how we do things today in Oozie 
using the DB).

On *If the WFAM does restart after a crash will it try to reestablish 
communication with App Masters*, the WFAM would be able to reconnect with 
children AMs without any issue as they would have continued working without 
knowing that the parent WFAM got restarted, it would use just their async 
client APIs.

On *How will the WFAM schedule containers*, the original idea is to do a 
passthrough to the RM, later this proxy may become more sophisticated and have 
an heuristic to reuse containers when it makes sense. This would be possible 
when the WFAM is using embedded AMs (ie an MRAM to run an MR job) and the 
embedded AMs support injection of Container implementations (ie to replace the 
default container allocation)

On *How do you decided which AM etc has a higher priority…*, we are constrained 
by a DAG, thus current DAG nodes get to run before upcoming ones.

On *How security going to be handled?*, no different from how is handled in 
MRAM.

On *I'm also curious about how you would see us getting to what I was talking 
about previously..*, agree with the direction/approach you are proposing in 
your previous comment. Furthermore, I'm currently prototyping, after Arun's 
suggestion, a JobControl subclass that converts the job dependency tree into a 
workflow XML (which it would be then executed by the WFAM or submitted to 
Oozie). If the prototype works as expected I'm planning to open a JIRA 
introducing a JobControlFactory which would return the current JobControl as 
default but via a configuration property could instead return a WFJobControl 
implementation based on the prototype I've just described. Then changing Pig to 
use the JobControlFactory to create the JobControl instead a constructor would 
give the flexibility of executing Pig in an WFAM or in a current version of 
Oozie.

Finally, on your comment on dynamic workflow generation, it is definitely 
doable today using the WorkflowLib API directly, I could put together a example 
early next week.

                
> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: MAPREDUCE-4495
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4495
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.0-alpha
>            Reporter: Bo Wang
>            Assignee: Bo Wang
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
> MapReduceWorkflowAM.pdf, yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to