[
https://issues.apache.org/jira/browse/HADOOP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12484207
]
Andrzej Bialecki commented on HADOOP-1144:
-------------------------------------------
Nutch could use this feature too - it's quite common that one of the map tasks,
which is e.g. parsing a difficult content like PDF or msdoc, crashes or gets
stuck. This should not be fatal to the whole job.
As for the configuration of the number of failed tasks - I think it would be
good to be able to choose between an absolute number or a percentage.
> Hadoop should allow a configurable percentage of failed map tasks before
> declaring a job failed.
> ------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1144
> URL: https://issues.apache.org/jira/browse/HADOOP-1144
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.12.0
> Reporter: Christian Kunz
> Fix For: 0.13.0
>
>
> In our environment it can occur that some map tasks will fail repeatedly
> because of corrupt input data, which sometimes is non-critical as long as the
> amount is limited. In this case it is annoying that the whole Hadoop job
> fails and cannot be restarted till the corrupt data are identified and
> eliminated from the input. It would be extremely helpful if the job
> configuration would allow to indicate how many map tasks are allowed to fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.