[
https://issues.apache.org/jira/browse/MAPREDUCE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated MAPREDUCE-5317:
----------------------------------
Status: Open (was: Patch Available)
FAIL_ABORT was intended to mirror KILL_ABORT and cover the state where we are
awaiting the completion of the abortJob call from the committer thread. In
that sense it may make more sense and be easier to understand what's happening
in each state if FAIL_WAIT was a separate state from FAIL_ABORT. It would
mirror the KILL_ABORT / KILL_WAIT states that exist in the job kill path.
Speaking of the KILL_WAIT state, if we're using a timeout for FAIL_WAIT it
should be applied to KILL_WAIT as well. Could just rename the event to
something like JOB_WAIT_TIMEOUT.
The JOB_FAIL_WAIT_TIMEDOUT event needs to be handled in more than just the
FAIL_WAIT state. The event could be backed up in the event queue and delivered
*after* we leave the state, so it needs to be at least ignored in any
downstream states from FAIL_WAIT.
The findbugs warning also seems legit.
> Stale files left behind for failed jobs
> ---------------------------------------
>
> Key: MAPREDUCE-5317
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5317
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.8, 2.0.4-alpha, 3.0.0
> Reporter: Ravi Prakash
> Assignee: Ravi Prakash
> Attachments: MAPREDUCE-5317.patch, MAPREDUCE-5317.patch
>
>
> Courtesy [~amar_kamat]!
> {quote}
> We are seeing _temporary files left behind in the output folder if the job
> fails.
> The job were failed due to hitting quota issue.
> I simply ran the randomwriter (from hadoop examples) with the default setting.
> That failed and left behind some stray files.
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira