[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15440372#comment-15440372
 ] 

Haibo Chen commented on MAPREDUCE-6771:
---------------------------------------

Analysis:
{code:java}
RMContainerAllocator.getResources() {
  ...
      for (ContainerStatus cont : finishedContainers) {
      LOG.info("Received completed container " + cont.getContainerId());
      TaskAttemptId attemptID = assignedRequests.get(cont.getContainerId());
      if (attemptID == null) {
        LOG.error("Container complete event for unknown container id "
            + cont.getContainerId());
      } else {
        pendingRelease.remove(cont.getContainerId());
        assignedRequests.remove(attemptID);
        
        // send the container completed event to Task attempt
        eventHandler.handle(createContainerFinishedEvent(cont, attemptID));
        
        // Send the diagnostics
        String diagnostics = StringInterner.weakIntern(cont.getDiagnostics());
        eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(attemptID,
            diagnostics));

        preemptionPolicy.handleCompletedContainer(attemptID);
      }
  ...
}
{code}
The scenario in question is described as follows: A job is running, and one of 
tasks attempt running on a NM is killed by the NM because the container exceeds 
its resource limit. The container status/diagnostics is sent to RM by the NM 
and then later to MR AM in its periodical heartbeat with RM as shown above. In 
MR AM, the task attempt is still in RUNNING state from AM's perspective, since 
the task heartbeat has not timed out. 

Upon receiving from RM that the task attempt container has finished, the 
RMCommunicator thread will place a ContainerFinishedEvent and a 
TaskAttemptDiagnosticsUpdateEvent in the event queue. 

The ContainerFinishedEvent will cause the task attempt in MR AM to transition 
from RUNNING to FAILED and a TaskAttemptUnsuccessfulCompletionEvent that 
contains the associated diagnostics information to be written to the .jhist 
file.  The TaskAttemptDiagnosticsUpdateEvent will update the diagnostics 
information associated with the task attempt. 

But since the ContainerFinishedEvent is placed and processed before the 
TaskAttemptDiagnosticsUpdateEvent, the TaskAttemptUnsuccessfulCompletionEvent 
written to .jhist file will not contain the diagnostics info received from RM.

After the job is completed, the user  tries to access the failed task attempts 
through JHS, the TaskAttemptUnsuccessfulCompletionEvent is parsed to generate 
the failed attempt page.  The page will not have diagnostics info from RM (such 
as container killed by Node Manager...) because it was never written to .jhist 
in the first place.

> Diagnostics information can be lost in .jhist if task containers are killed 
> by Node Manager.
> --------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6771
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6771
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.7.3
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>
> Task containers can go over their resource limit, and killed by Node Manager. 
> Then MR AM gets notified of the container status and diagnostics information 
> through its heartbeat with RM.  However, it is possible that the diagnostics 
> information never gets into .jhist file, so when the job completes, the 
> diagnostics information associated with the failed task attempts is empty.  
> This makes it hard for users to root cause job failures that are often caused 
> by memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to