[ 
https://issues.apache.org/jira/browse/HADOOP-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12492773
 ] 

Andrzej Bialecki  commented on HADOOP-1144:
-------------------------------------------

bq. I'd rather have the user implement a simple sub-class of the RecordReader 
in question to ignore the exception and return 'false' from next(key, value) - 
that should be very easy, no?

Yes, it would - however, usually you are dealing with the same application, and 
changing data, and the data in most cases is valid. So it's easier to be able 
to accept corrupted input data for a single job by turning a config knob rather 
than re-implementing all your InputFormat-s ...

> Hadoop should allow a configurable percentage of failed map tasks before 
> declaring a job failed.
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1144
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1144
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.0
>            Reporter: Christian Kunz
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>
> In our environment it can occur that some map tasks will fail repeatedly 
> because of corrupt input data, which sometimes is non-critical as long as the 
> amount is limited. In this case it is annoying that the whole Hadoop job 
> fails and cannot be restarted till the corrupt data are identified and 
> eliminated from the input. It would be extremely helpful if the job 
> configuration would allow to indicate how many map tasks are allowed to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to