[
https://issues.apache.org/jira/browse/OOZIE-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-1770:
---------------------------------
Attachment: oya.patch
oya-rm-screenshot.jpg
We did the code against CDH, but I was able to port it back over to Apache
trunk without conflicts other than pom changes. I decided that it was better
to make this available sooner rather than spend too much time spiffying it up.
Here’s a brief overview of the design:
Oozie has an unmanaged AM “pool” that it uses for submitting jobs. We need a
pool because we have to create an AM for each user that submits a job (we
adapted some code from Llama). When Oozie wants to submit a job, instead of
submitting an MR launcher job, it can create/get one of these AM’s and use it
to create a Yarn container, and then run the launcher in that container.
During our testing, we were using a Java action that launches a simple MR job.
In the screenshot below, you can see that we have the one “OozieServer” AM, and
then 3 MAPREDUCE applications, from when we ran the workflow 3 times. The
OozieServer AM was reused each time to submit the MR jobs, and there’s no
longer a Launcher Job.
!oya-rm-screenshot.jpg!
Given that this was more of a proof-of-concept and we didn’t have a lot of
time, we didn’t redo the launcher code. It still uses LauncherMapper; I just
hacked in some extra methods for running it outside of a map task so we could
run it in the container. This is definitely an area where we can improve
things a lot. One major thing to keep in mind is that the container gives us a
Shell; right now, we’re then starting a JVM to run the LauncherMapper code, but
it probably would make sense to see if we can skip the JVM and run most actions
directly in the shell.
Interestingly, running a Container that doesn’t do much (e.g. a “Hello World”
Java action), runs so fast, that Oozie is now the bottleneck. The callback
comes in before the action has a chance to transition to RUNNING, so Oozie
complains. We fixed this by adding a delay. We’ll probably want to improve
this.
I had to remove MR1 support, obviously. This greatly simplifies the build
because we don’t need hadooplibs anymore. I also had to up the Hadoop version
to 2.4.0+ from 2.3.0.
We can probably get rid of this eventually, but for now, I made it so that if
you set {{ooze.use.jobclient.launch=true}} in oozie-site, Oozie will use the
old MR launcher job behavior instead of the container behavior.
Here’s a to do list of things that need to be fixed or improved:
- The container currently runs as the Yarn user regardless of who submitted the
workflow. So, permissions-wise, workflows only work if you submit them as the
Yarn user.
- The shell action should not start a JVM
- Actions should be refactored/cleaned-up/simplified to not need all the stuff
LauncherMapper is doing
-- Regardless of whether or not we still start a JVM, there’s a bunch
of stuff LauncherMapper does that we should get rid of or simplify or change
-- The Shell action definitely doesn’t need a JVM; once the user
problem is fixed, it should also finally run as the proper user!
- The NMToken expires after 10-15min and you can’t submit containers anymore
from the AM’s in the AM pool
- The Oozie kill command still expects an MR job and doesn’t work; it needs to
use the container id
- The status checking code is pretty hacky. I modified it to check the
container status, but it doesn’t check if the job actually failed or not
- The callback is pretty hacky. The Launcher’s MR AM was sending the callback
to Oozie before. I had to make the container do this now
-- It may make sense to come up with a different mechanism for this
- I haven’t tried any recovery stuff or Oozie HA
- I had to add two columns: one to store node Id host and one to store node Id
port. OozieDBCLI needs to be updated to create these during an upgrade.
Creating a new database works fine though because OpenJPA handles it.
- We’ve only tested with the Java action. Most of the other actions should
work with some minor tweaking (need to set expected extra env vars, etc). The
MR action probably won’t work because of the swapping optimization; it probably
makes sense to get rid of that.
The oya.patch has all of the changes.
> Create Oozie Application Master for YARN
> ----------------------------------------
>
> Key: OOZIE-1770
> URL: https://issues.apache.org/jira/browse/OOZIE-1770
> Project: Oozie
> Issue Type: New Feature
> Reporter: Bowen Zhang
> Assignee: Bowen Zhang
> Attachments: oya-rm-screenshot.jpg, oya.patch
>
>
> After the first release of oozie on hadoop 2, it will be good if users can
> set execution engine in oozie conf, be it YARN AM or traditional MR. We can
> target this for post oozie 4.1 release.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)