Jason Lowe created MAPREDUCE-4890:
-------------------------------------

             Summary: Invalid TaskImpl state transitions when task fails while 
speculating
                 Key: MAPREDUCE-4890
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4890
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mr-am
    Affects Versions: 0.23.5, 2.0.2-alpha
            Reporter: Jason Lowe
            Priority: Critical


There are a couple of issues when a task fails while speculating (i.e.: 
multiple attempts are active):

# The other active attempts are not killed.
# TaskImpl's FAILED state does not handle the T_ATTEMPT_* set of events which 
can be sent from the other active attempts.  These all need to be handled since 
they can be sent asynchronously from the other active task attempts.

Failure to handle this properly means jobs that are configured to normally 
tolerate failures via mapreduce.map.failures.maxpercent or 
mapreduce.reduce.failures.maxpercent and also speculate can easily end up 
failing due to invalid state transitions rather than complete successfully with 
a few explicitly allowed task failures.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to