Hi all,

This is more a longer term idea, but could always be started sooner in a
feature branch.  I think a great flagship feature for Oozie 5 would be
running Oozie on YARN instead of MapReduce.  As you all know, the launcher
MR job is essentially a big hack and adds all kinds of overhead and
complications.  If we used AMs to run jobs directly in YARN containers
instead, that would give us so many advantages; plus, YARN's purpose is
exactly what we've been hijacking MR for.

This is obviously a large feature, and will require a major revamping of
most of the action types, all the checking code, etc.  It would be great if
we could all work on this.  We're still discussing internally how much time
we (Cloudera) can devote to this, but I wanted to gauge what others thought
of this idea and if you'd be interested in working on it.

Karthik and I already posted a hacky proof of concept that we worked on
during a Hackathon to OOZIE-1770
<https://issues.apache.org/jira/browse/OOZIE-1770>, called OYA (Oozie on
YARN); though I imagine the patch needs some tweaking to apply cleanly at
this point.  I think that can serve as a basis for how it would work
(Karthik is working on getting the AM pool into YARN itself so that
wouldn't be needed anymore); the JIRA also has a list of things to
do/improve still.

Once we have Oozie on YARN, we'd be able to finally fix some of the
long-standing pain points in Oozie, which I'm pretty excited about:
- Displaying the logs from the launcher inside Oozie!  Yarn has an API call
for this.
- Full control over the classpath of the launcher: we can make the sharelib
optional if the user has the necessary jars installed on all nodes
- We can run actions more similarly to how users run them (by calling their
wrapper scripts instead of their Java Main's directly), which should cut
down on the "It works from the CLI but not from Oozie" problems

Please let me know what you think.

thanks
- Robert

Reply via email to