I was planning on talking about this at the Oozie BoF session yesterday, but I got stuck at the YARN BoF session. Anyway, I've attached a few slides I had prepared, though it's mostly the same info as in my previous email and on OOZIE-1770.
Please let me know what you think. - Robert On Fri, May 1, 2015 at 1:26 PM, Robert Kanter <[email protected]> wrote: > Hi all, > > This is more a longer term idea, but could always be started sooner in a > feature branch. I think a great flagship feature for Oozie 5 would be > running Oozie on YARN instead of MapReduce. As you all know, the launcher > MR job is essentially a big hack and adds all kinds of overhead and > complications. If we used AMs to run jobs directly in YARN containers > instead, that would give us so many advantages; plus, YARN's purpose is > exactly what we've been hijacking MR for. > > This is obviously a large feature, and will require a major revamping of > most of the action types, all the checking code, etc. It would be great if > we could all work on this. We're still discussing internally how much time > we (Cloudera) can devote to this, but I wanted to gauge what others thought > of this idea and if you'd be interested in working on it. > > Karthik and I already posted a hacky proof of concept that we worked on > during a Hackathon to OOZIE-1770 > <https://issues.apache.org/jira/browse/OOZIE-1770>, called OYA (Oozie on > YARN); though I imagine the patch needs some tweaking to apply cleanly at > this point. I think that can serve as a basis for how it would work > (Karthik is working on getting the AM pool into YARN itself so that > wouldn't be needed anymore); the JIRA also has a list of things to > do/improve still. > > Once we have Oozie on YARN, we'd be able to finally fix some of the > long-standing pain points in Oozie, which I'm pretty excited about: > - Displaying the logs from the launcher inside Oozie! Yarn has an API > call for this. > - Full control over the classpath of the launcher: we can make the > sharelib optional if the user has the necessary jars installed on all nodes > - We can run actions more similarly to how users run them (by calling > their wrapper scripts instead of their Java Main's directly), which should > cut down on the "It works from the CLI but not from Oozie" problems > > Please let me know what you think. > > thanks > - Robert >
