When a reducer fails on DFS quota, the job should fail immediately
------------------------------------------------------------------
Key: MAPREDUCE-1967
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1967
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Dick King
Suppose an M/R job has so much output that the user is certain to exceed hir
quota. Then some of the reducers will succeed but the job will get into a
state where the remaining reducers squabble over the remaining space. The
remaining reducers will nibble at the remaining space, and finally one reducer
will fail on quota. Its output file will be erased, and the other reducers
will collectively consume that space until one of _them_ fails on quota. Since
the incomplete reducer that fails on quota is "chosen" randomly, the tasks will
accumulate their failures at similar rates, and the system will have made a
substantial futile investment.
I would like to say that if a single reducer fails on DFS quota, the job should
be failed. There may be a corner case that induces us to think that we
shouldn't be quite this stringent, but at least we shouldn't have to await four
failures by one task before shutting the job down.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.