Rahul Jain created MAPREDUCE-4428:
-------------------------------------

             Summary: A failed job is not available under job history if the 
job is killed right around the time job is notified as failed 
                 Key: MAPREDUCE-4428
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobhistoryserver, jobtracker
    Affects Versions: 2.0.0-alpha
            Reporter: Rahul Jain


We have observed this issue consistently running hadoop CDH4 version (based 
upon 2.0 alpha release):

In case our hadoop client code gets a notification for a completed job ( using 
RunningJob object job, with (job.isComplete() && job.isSuccessful()==false)
the hadoop client code does an unconditional job.killJob() to terminate the job.

With earlier hadoop versions (verified on hadoop 0.20.2 version), we still  
have full access to job logs afterwards through hadoop console. However, when 
using MapReduceV2, the failed hadoop job no longer shows up under jobhistory 
server. Also, the tracking URL of the job still points to the non-existent 
Application master http port.

Once we removed the call to job.killJob() for failed jobs from our hadoop 
client code, we were able to access the job in job history with mapreduce V2 as 
well. Therefore this appears to be a race condition in the job management wrt. 
job history for failed jobs.

We do have the application master and node manager logs collected for this 
scenario if that'll help isolate the problem and the fix better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to