[ 
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632228#comment-14632228
 ] 

Srikanth Sundarrajan commented on OOZIE-1770:
---------------------------------------------

Running Oozie launcher tasks as a Map reduce task/job is indeed a huge hack and 
we should most certainly look to take advantage of YARN and integrate more 
directly with it. Here are possibly some direct rewards that we should look to 
reap with such a direct integration.
  - Cleaner integration (No artificial split creations, Input & Output exchange 
mechanisms)
  - Assumptions in MR of tasks being idempotent is a huge limitation and new 
solution should be able to overcome this
  - Heavy resource overheads in terms of App Master/Launcher task for each 
action can be avoided
  - Issues such as App Master restarts or Task Attempt relaunches causes both 
lost work and possibly issues with data today. They can be avoided
 
Taking a step back, here is the list of possible ways in which we can integrate 
with YARN more natively.

+Actions executed via Native Oozie App Master+
An App Master which is capable of executing Oozie Action directly as opposed to 
making it appear as a MR Job. This in all likely hood going to appear like the 
current MR based execution in uber mode. Doesn't really offer much other than 
moving away from Map task execution mode.

+Actions executed via Single AM per user+
A reusable Oozie AM per user, which creates launcher containers for each action 
(as proposed by [~rkanter]). This would allow us to reduce the AM overheads and 
also reduce the launch latency (as AMs are ready and warmed up) and would 
launch tasks more natively as opposed to it appearing as MR job.

+Workflows executed via a Single AM+
Run the entire workflow in a single AM. In this mode, the workflow and all its 
actions (DagEngine) is actually executed on the Oozie Workflow AM and all the 
child actions can either be executed in a action specific thread/class loader 
by default with an ability to execute them in a forked container. In this mode, 
the Oozie Workflows can be executed at a much lower overheads, with the 
possibility of lowering the burden on Oozie server. This ofcourse introduces 
challenges relating to maintaining state in Oozie DB relating to workflow 
execution. However can be solved by maintaining state in HDFS with notification 
+ polling based updates by Oozie server to DB. 

My personal choice would be to do the last option as we can allow Workflow 
execution to be used outside of Oozie Coordinators besides allwoing Oozie 
server to scale better, while keeping the larger objective of moving away from 
Map Reduce jobs for Oozie actions. Thoughts ?


> Create Oozie Application Master for YARN
> ----------------------------------------
>
>                 Key: OOZIE-1770
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1770
>             Project: Oozie
>          Issue Type: New Feature
>            Reporter: Bowen Zhang
>            Assignee: Bowen Zhang
>         Attachments: oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can 
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can 
> target this for post oozie 4.1 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to