Hi all, This is more a longer term idea, but could always be started sooner in a feature branch. I think a great flagship feature for Oozie 5 would be running Oozie on YARN instead of MapReduce. As you all know, the launcher MR job is essentially a big hack and adds all kinds of overhead and complications. If we used AMs to run jobs directly in YARN containers instead, that would give us so many advantages; plus, YARN's purpose is exactly what we've been hijacking MR for.
This is obviously a large feature, and will require a major revamping of most of the action types, all the checking code, etc. It would be great if we could all work on this. We're still discussing internally how much time we (Cloudera) can devote to this, but I wanted to gauge what others thought of this idea and if you'd be interested in working on it. Karthik and I already posted a hacky proof of concept that we worked on during a Hackathon to OOZIE-1770 <https://issues.apache.org/jira/browse/OOZIE-1770>, called OYA (Oozie on YARN); though I imagine the patch needs some tweaking to apply cleanly at this point. I think that can serve as a basis for how it would work (Karthik is working on getting the AM pool into YARN itself so that wouldn't be needed anymore); the JIRA also has a list of things to do/improve still. Once we have Oozie on YARN, we'd be able to finally fix some of the long-standing pain points in Oozie, which I'm pretty excited about: - Displaying the logs from the launcher inside Oozie! Yarn has an API call for this. - Full control over the classpath of the launcher: we can make the sharelib optional if the user has the necessary jars installed on all nodes - We can run actions more similarly to how users run them (by calling their wrapper scripts instead of their Java Main's directly), which should cut down on the "It works from the CLI but not from Oozie" problems Please let me know what you think. thanks - Robert
