Jason Lowe created MAPREDUCE-4999:
-------------------------------------

             Summary: AM attempt ended up in ERROR state and generated history 
after node decommissioned
                 Key: MAPREDUCE-4999
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4999
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 0.23.6
            Reporter: Jason Lowe


Saw a case where a job recorded history for an app attempt that ended up in the 
ERROR state after the node the AM was running on was decommissioned.  When the 
node was decommissioned, the RM marked all the containers on the node as killed 
and subsequently the application attempt was invalidated.  When the AM attempt 
heartbeated in before the NM did (and therefore before the NM killed the AM) it 
discovered it was no longer a valid app attempt and exited in the ERROR state.  
However it also thought, incorrectly, that it was the last attempt and 
generated the history for the job.

Decommissioning a node should not cause an app attempt to end up in the ERROR 
state with history, as the subsequent app attempt should be the one to generate 
the definitive history for the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to