[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889206#comment-15889206
 ] 

Jian He commented on MAPREDUCE-6852:
------------------------------------

looks like getJobID is used in the same class in several other places, we may 
just use this method.

> Job#updateStatus() failed with NPE due to race condition
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-6852
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: MAPREDUCE-6852.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query 
> occasionally failed on NPE - "Pig uses JobControl API to track MR job status, 
> but sometimes Job History Server failed to flush job meta files to HDFS which 
> caused the status update failed." Beside NPE in 
> o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the 
> exception is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
>       at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
>       at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
>       at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
>       at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be 
> RUNNING as code below:
> {noformat}
>   public boolean isComplete() throws IOException {
>     ensureState(JobState.RUNNING);
>     updateStatus();
>     return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same 
> time: ensure state first, then one thread update the state to null while the 
> other thread hit NPE issue here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to