[ https://issues.apache.org/jira/browse/HADOOP-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592840#action_12592840 ]
Doug Cutting commented on HADOOP-153: ------------------------------------- > why not, simply, let the record reader skip the bad record(s) [ ... ] ? If the record reader can identify bad records and skip them, then the framework need not get involved: the RecordReader iteself can catch exceptions that it knows might be thrown by bad records and then try to read the next. > skip records that throw exceptions > ---------------------------------- > > Key: HADOOP-153 > URL: https://issues.apache.org/jira/browse/HADOOP-153 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.2.0 > Reporter: Doug Cutting > Assignee: Devaraj Das > > MapReduce should skip records that throw exceptions. > If the exception is thrown under RecordReader.next() then RecordReader > implementations should automatically skip to the start of a subsequent record. > Exceptions in map and reduce implementations can simply be logged, unless > they happen under RecordWriter.write(). Cancelling partial output could be > hard. So such output errors will still result in task failure. > This behaviour should be optional, but enabled by default. A count of errors > per task and job should be maintained and displayed in the web ui. Perhaps > if some percentage of records (>50%?) result in exceptions then the task > should fail. This would stop jobs early that are misconfigured or have buggy > code. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.