[ 
https://issues.apache.org/jira/browse/MAPREDUCE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756475#action_12756475
 ] 

Hong Tang commented on MAPREDUCE-995:
-------------------------------------

The bug seems trickier than I thought...

There is an obvious race condition:
Thread 1:
{code}
      writer.close();
      fileMap.get(id).clearEventWriter();
{code}

Thread 2:
{code}
     MetaInfo mi = fileMap.get(jobId);
     if (mi == null || (writer = mi.getEventWriter()) == null) {
{code}

writer.close() should be inside clearEventWriter(), and clearEventWriter() 
should close the writer and set it to null atomically. Another problem is that 
thread 2 may get a valid writer instance, but may later be closed by thread 1. 
So I think the right fix would have to synchronize on the MetaInfo for both 
clearing event writer and writing logs - logEvent would have to be implemented 
on MetaInfo, and synchornized with clearEventWriter().

> JobHistory should handle cases where task completion events are generated 
> after job completion event
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-995
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-995
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jothi Padmanabhan
>            Assignee: Jothi Padmanabhan
>         Attachments: mapred-995.patch
>
>
> It is apparently possible, in certain circumstances (failed job, for 
> example), for the job history to get task completion events after the job 
> completion event. This currently causes NPE in job history.
> Thanks Hong for identifying this issue

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to