[ 
https://issues.apache.org/jira/browse/OOZIE-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16490736#comment-16490736
 ] 

Peter Cseh commented on OOZIE-1178:
-----------------------------------

Yeah, this is for run and manage the workflow execution from a Yarn AM instead 
of the Oozie server to make Oozie more scalable.
There are some issues with this though:
# how to not DDOS the database? If every WFAM communicates with the Oozie 
server to talk to the database, would it help the scalability at all?
# As [~andras.piros] mentioned there are some issues with synchronous actions?
## how to run ssh action? - will it be even supported?
## Email and FS action are looking more managable
# How we handle getting and injecting delegation tokens to the WFAM for every 
action? We certainly don't want to distribute the Oozie keytab within the Yarn 
cluster

There are some crazy upsides in this though: it would open up the possibility 
to execute way more dynamic workflows (e.g. workflow defined by code) as user 
code would run more contained.


> Workflow Application Master in YARN
> -----------------------------------
>
>                 Key: OOZIE-1178
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1178
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Bo Wang
>            Priority: Major
>         Attachments: MAPREDUCE-4495-v1.1.patch, MAPREDUCE-4495-v1.patch, 
> MapReduceWorkflowAM.pdf, yapp_proposal.txt
>
>
> It is useful to have a workflow application master, which will be capable of 
> running a DAG of jobs. The workflow client submits a DAG request to the AM 
> and then the AM will manage the life cycle of this application in terms of 
> requesting the needed resources from the RM, and starting, monitoring and 
> retrying the application's individual tasks.
> Compared to running Oozie with the current MapReduce Application Master, 
> these are some of the advantages:
>  - Less number of consumed resources, since only one application master will 
> be spawned for the whole workflow.
>  - Reuse of resources, since the same resources can be used by multiple 
> consecutive jobs in the workflow (no need to request/wait for resources for 
> every individual job from the central RM).
>  - More optimization opportunities in terms of collective resource requests.
>  - Optimization opportunities in terms of rewriting and composing jobs in the 
> workflow (e.g. pushing down Mappers).
>  - This Application Master can be reused/extended by higher systems like Pig 
> and hive to provide an optimized way of running their workflows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to