[ https://issues.apache.org/jira/browse/HADOOP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492634 ]
Arun C Murthy commented on HADOOP-1144: --------------------------------------- I'd like to propose a 'mapred.task.failures.percent' config knob which is set to 0 by default - implies that *any* failed TIP leads to the job being declared as a failure i.e. the current behaviour. This could be set to '100' which then leads to 'best-effort' kind of job where TIPs which fail 4 times are abandoned as-per HADOOP-39 and should also satisfy requirements that Christian and Andrzej describe. I'm not entirely comfortable with HADOOP-39 in itself since it does not address the situation where a user might not want the job to run to completion when more than, say 50%, of maps fail. W.r.t. to the interface to determine failures what do others think they need from it? Would it be useful, for e.g., to get the details of the 'input splits' of the TIPs which failed? Anything else? Thoughts? > Hadoop should allow a configurable percentage of failed map tasks before > declaring a job failed. > ------------------------------------------------------------------------------------------------ > > Key: HADOOP-1144 > URL: https://issues.apache.org/jira/browse/HADOOP-1144 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.0 > Reporter: Christian Kunz > Assigned To: Arun C Murthy > Fix For: 0.13.0 > > > In our environment it can occur that some map tasks will fail repeatedly > because of corrupt input data, which sometimes is non-critical as long as the > amount is limited. In this case it is annoying that the whole Hadoop job > fails and cannot be restarted till the corrupt data are identified and > eliminated from the input. It would be extremely helpful if the job > configuration would allow to indicate how many map tasks are allowed to fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.