[
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632228#comment-14632228
]
Srikanth Sundarrajan commented on OOZIE-1770:
---------------------------------------------
Running Oozie launcher tasks as a Map reduce task/job is indeed a huge hack and
we should most certainly look to take advantage of YARN and integrate more
directly with it. Here are possibly some direct rewards that we should look to
reap with such a direct integration.
- Cleaner integration (No artificial split creations, Input & Output exchange
mechanisms)
- Assumptions in MR of tasks being idempotent is a huge limitation and new
solution should be able to overcome this
- Heavy resource overheads in terms of App Master/Launcher task for each
action can be avoided
- Issues such as App Master restarts or Task Attempt relaunches causes both
lost work and possibly issues with data today. They can be avoided
Taking a step back, here is the list of possible ways in which we can integrate
with YARN more natively.
+Actions executed via Native Oozie App Master+
An App Master which is capable of executing Oozie Action directly as opposed to
making it appear as a MR Job. This in all likely hood going to appear like the
current MR based execution in uber mode. Doesn't really offer much other than
moving away from Map task execution mode.
+Actions executed via Single AM per user+
A reusable Oozie AM per user, which creates launcher containers for each action
(as proposed by [~rkanter]). This would allow us to reduce the AM overheads and
also reduce the launch latency (as AMs are ready and warmed up) and would
launch tasks more natively as opposed to it appearing as MR job.
+Workflows executed via a Single AM+
Run the entire workflow in a single AM. In this mode, the workflow and all its
actions (DagEngine) is actually executed on the Oozie Workflow AM and all the
child actions can either be executed in a action specific thread/class loader
by default with an ability to execute them in a forked container. In this mode,
the Oozie Workflows can be executed at a much lower overheads, with the
possibility of lowering the burden on Oozie server. This ofcourse introduces
challenges relating to maintaining state in Oozie DB relating to workflow
execution. However can be solved by maintaining state in HDFS with notification
+ polling based updates by Oozie server to DB.
My personal choice would be to do the last option as we can allow Workflow
execution to be used outside of Oozie Coordinators besides allwoing Oozie
server to scale better, while keeping the larger objective of moving away from
Map Reduce jobs for Oozie actions. Thoughts ?
> Create Oozie Application Master for YARN
> ----------------------------------------
>
> Key: OOZIE-1770
> URL: https://issues.apache.org/jira/browse/OOZIE-1770
> Project: Oozie
> Issue Type: New Feature
> Reporter: Bowen Zhang
> Assignee: Bowen Zhang
> Attachments: oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can
> target this for post oozie 4.1 release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)