[
https://issues.apache.org/jira/browse/OOZIE-1722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Kanter updated OOZIE-1722:
---------------------------------
Attachment: OOZIE-1722.patch
OOZIE-1722.patch
Most of the changes in the patch are for the Hadoop Utils module stuff; the
actual code changes are pretty minimal.
Here's more detail in how this all works:
- The {{JavaActionExecutor}} sets the yarn tag property to use for each
launcher. We can't use the action's id because it might be too long, so we
hash it to a fixed length.
- Before actually running an action's client, the launcher calls
{{LauncherMainHadoopUtils.killChildYarnJobs(...)}}. In the Hadoop-1, Hadoop-2,
and Hadoop-23 versions, this does nothing; in the Hadoop-3 version, this will
will find any jobs with the corresponding tag and kill them
-- In the MR action, its a little different, but the idea is the same. The
launcher instead calls
{{LauncherMainHadoopUtils.checkHasYarnJobsForMapReduceAction(...)}} which
returns true if the action job is running (for Hadoop-3) and false otherwise
I've tested that it works properly against a Hadoop with the yarn tag patches
(it wasn't Hadoop-3 though) and that it doesn't break Hadoop-1; I also verified
that the proper Hadoop Utils jar ends up in the Oozie sharelib. I didn't write
any tests because I think any test would be flakey (the timing would be tricky)
but if anyone has any good ideas for tests, let me know.
> When an ApplicationMaster restarts, it restarts the launcher job
> ----------------------------------------------------------------
>
> Key: OOZIE-1722
> URL: https://issues.apache.org/jira/browse/OOZIE-1722
> Project: Oozie
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Robert Kanter
> Assignee: Robert Kanter
> Attachments: OOZIE-1722.patch, OOZIE-1722.patch
>
>
> When using Yarn, there are some situations in which the ApplicationMaster can
> be restarted (e.g. RM failover, the AM dies and another attempt is made,
> etc).
> When this happens, it starts the launcher job again, which will start over.
> So, if that launcher has already launched a job, we'll end up with two
> instances of the same job, which can be problematic. For example, if you
> have a Pig action, the Pig client might run a job, but then the launcher gets
> restarted by an AM restart and launches that same job again.
> We don't have a way of "re-attaching" to previously launched jobs; however,
> with YARN-1461 and MAPREDUCE-5699, we can use yarn tags to find anything the
> launcher previously launched that's running and kill them. We still have to
> start over, but at least we're not running two instances of a job at the same
> time.
> Here's what we can do for each action type:
> - Pig, Sqoop, Hive
> -- Kill previously launched jobs and start over
> - MapReduce (different because of the optimization)
> -- Exit launcher if a previously launched job already exists
> - Java, Shell
> -- No out-of-the-box support for this
> -- Like with other things, the Java action can take advantage of this like
> Pig, Sqoop, and Hive if the user adds some code
> - DistCp
> -- Not supported
> - SSH, Email
> -- N/A
> The yarn tags won't be available until Hadoop 2.4.0, but is in the nightly
> (i.e. Hadoop 3.0.0-SNAPSHOT); and its obviously not in Hadoop 1.x. To be
> able to use the Yarn methods and the new methods for tagging, we can add a
> new type of Hadooplib called "Hadoop Utils" where we can put classes that are
> specific to a specific version of Hadoop; the other implementations can have
> dummy versions. For example, in the Hadoop-2 Hadoop Utils, we can put a
> method foo() that calls some yarn stuff but in the Hadoop-1 Hadoop Utils, the
> foo() method would either do the equivalent in MR1 or a no-op. So for now, I
> put some methods in the Hadoop-3 Hadoop Utils that use the tags and the
> Hadoop-1, Hadoop-2, and Hadoop-23 Hadoop Utils all have dummy implementations
> that don't do anything (so the existing behavior is preserved). The Hadoop
> Utils modules will allow us to take advantage of Hadoop 2 only features in
> the future, while still being able to compile against Hadoop 1; so it's not
> just limited to this feature.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)