[jira] [Created] (OOZIE-1722) When an ApplicationMaster restarts, it restarts the launcher job

Robert Kanter (JIRA) Fri, 28 Feb 2014 13:48:13 -0800

Robert Kanter created OOZIE-1722:
------------------------------------

             Summary: When an ApplicationMaster restarts, it restarts the 
launcher job
                 Key: OOZIE-1722
                 URL: https://issues.apache.org/jira/browse/OOZIE-1722
             Project: Oozie
          Issue Type: Improvement
    Affects Versions: trunk
            Reporter: Robert Kanter
            Assignee: Robert Kanter



When using Yarn, there are some situations in which the ApplicationMaster can 
be restarted (e.g. RM failover, the AM dies and another attempt is made, etc).  

When this happens, it starts the launcher job again, which will start over.  
So, if that launcher has already launched a job, we'll end up with two 
instances of the same job, which can be problematic.  For example, if you have 
a Pig action, the Pig client might run a job, but then the launcher gets 
restarted by an AM restart and launches that same job again.  

We don't have a way of "re-attaching" to previously launched jobs; however, 
with YARN-1461 and MAPREDUCE-5699, we can use yarn tags to find anything the 
launcher previously launched that's running and kill them.  We still have to 
start over, but at least we're not running two instances of a job at the same 
time.

Here's what we can do for each action type:
- Pig, Sqoop, Hive
-- Kill previously launched jobs and start over
- MapReduce (different because of the optimization)
-- Exit launcher if a previously launched job already exists
- Java, Shell
-- No out-of-the-box support for this
-- Like with other things, the Java action can take advantage of this like Pig, 
Sqoop, and Hive if the user adds some code
- DistCp
-- Not supported
- SSH, Email
-- N/A

The yarn tags won't be available until Hadoop 2.4.0, but is in the nightly 
(i.e. Hadoop 3.0.0-SNAPSHOT); and its obviously not in Hadoop 1.x.  To be able 
to use the Yarn methods and the new methods for tagging, we can add a new type 
of Hadooplib called "Hadoop Utils" where we can put classes that are specific 
to a specific version of Hadoop; the other implementations can have dummy 
versions.  For example, in the Hadoop-2 Hadoop Utils, we can put a method foo() 
that calls some yarn stuff but in the Hadoop-1 Hadoop Utils, the foo() method 
would either do the equivalent in MR1 or a no-op.  So for now, I put some 
methods in the Hadoop-3 Hadoop Utils that use the tags and the Hadoop-1, 
Hadoop-2, and Hadoop-23 Hadoop Utils all have dummy implementations that don't 
do anything (so the existing behavior is preserved).  The Hadoop Utils modules 
will allow us to take advantage of Hadoop 2 only features in the future, while 
still being able to compile against Hadoop 1; so it's not just limited to this 
feature.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (OOZIE-1722) When an ApplicationMaster restarts, it restarts the launcher job

Reply via email to