[
https://issues.apache.org/jira/browse/MAPREDUCE-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756475#action_12756475
]
Hong Tang commented on MAPREDUCE-995:
-------------------------------------
The bug seems trickier than I thought...
There is an obvious race condition:
Thread 1:
{code}
writer.close();
fileMap.get(id).clearEventWriter();
{code}
Thread 2:
{code}
MetaInfo mi = fileMap.get(jobId);
if (mi == null || (writer = mi.getEventWriter()) == null) {
{code}
writer.close() should be inside clearEventWriter(), and clearEventWriter()
should close the writer and set it to null atomically. Another problem is that
thread 2 may get a valid writer instance, but may later be closed by thread 1.
So I think the right fix would have to synchronize on the MetaInfo for both
clearing event writer and writing logs - logEvent would have to be implemented
on MetaInfo, and synchornized with clearEventWriter().
> JobHistory should handle cases where task completion events are generated
> after job completion event
> ----------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-995
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Jothi Padmanabhan
> Assignee: Jothi Padmanabhan
> Attachments: mapred-995.patch
>
>
> It is apparently possible, in certain circumstances (failed job, for
> example), for the job history to get task completion events after the job
> completion event. This currently causes NPE in job history.
> Thanks Hong for identifying this issue
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.