When a reducer fails on DFS quota, the job should fail immediately
------------------------------------------------------------------

                 Key: MAPREDUCE-1967
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1967
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Dick King


Suppose an M/R job has so much output that the user is certain to exceed hir 
quota.  Then some of the reducers will succeed but the job will get into a 
state where the remaining reducers squabble over the remaining space.  The 
remaining reducers will nibble at the remaining space, and finally one reducer 
will fail on quota.  Its output file will be erased, and the other reducers 
will collectively consume that space until one of _them_ fails on quota.  Since 
the incomplete reducer that fails on quota is "chosen" randomly, the tasks will 
accumulate their failures at similar rates, and the system will have made a 
substantial futile investment.

I would like to say that if a single reducer fails on DFS quota, the job should 
be failed.  There may be a corner case that induces us to think that we 
shouldn't be quite this stringent, but at least we shouldn't have to await four 
failures by one task before shutting the job down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to