[ https://issues.apache.org/jira/browse/HADOOP-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593060#action_12593060 ]
Yiping Han commented on HADOOP-153: ----------------------------------- In our user case, when re-execute, we do not want to skip those killer records. Instead, in our user-supplied mapper we want to be able to identify them such that we can take a different code path to specialize them. So, we would like to see during re-executing, the records are marked by a flag indicating they are killer records in the previous runs. Or any alternatives that could give us the same information would help. > skip records that throw exceptions > ---------------------------------- > > Key: HADOOP-153 > URL: https://issues.apache.org/jira/browse/HADOOP-153 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.2.0 > Reporter: Doug Cutting > Assignee: Devaraj Das > > MapReduce should skip records that throw exceptions. > If the exception is thrown under RecordReader.next() then RecordReader > implementations should automatically skip to the start of a subsequent record. > Exceptions in map and reduce implementations can simply be logged, unless > they happen under RecordWriter.write(). Cancelling partial output could be > hard. So such output errors will still result in task failure. > This behaviour should be optional, but enabled by default. A count of errors > per task and job should be maintained and displayed in the web ui. Perhaps > if some percentage of records (>50%?) result in exceptions then the task > should fail. This would stop jobs early that are misconfigured or have buggy > code. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.