[ 
https://issues.apache.org/jira/browse/HADOOP-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592734#action_12592734
 ] 

Joydeep Sen Sarma commented on HADOOP-153:
------------------------------------------

hey folks - we are having a discussion on a similar jira (covering a smaller 
subset of issues) - 3144. we are actually hitting this problem (corrupted 
records causing OOM) and have a simple workaround specific to our problem.

but i am a little intrigued by the proposal here. for the recordreader issues - 
why not, simply, let the record reader skip the bad record(s). as the 
discussions here mentions - there have to be additional api's in the record 
reader to be able to skip problematic records. If the framework trusts record 
readers to be able to skip bad records - why bother re-executing? why not allow 
them to detect and skip bad records on the very first try. if TT/JT want to 
keep track and impose a limit on the bad records skipped - they could ask the 
record reader to report the same through an api.

the exceptions from map/reduce functions are different  - if they make the 
entire task unstable due to OOM issues then a re-execution makes sense. but if 
we separate the two issues - we may have a more lightweight  way of tolerating 
pure data corruption/validity issues (as we are trying to in 3144).

> skip records that throw exceptions
> ----------------------------------
>
>                 Key: HADOOP-153
>                 URL: https://issues.apache.org/jira/browse/HADOOP-153
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>    Affects Versions: 0.2.0
>            Reporter: Doug Cutting
>            Assignee: Devaraj Das
>
> MapReduce should skip records that throw exceptions.
> If the exception is thrown under RecordReader.next() then RecordReader 
> implementations should automatically skip to the start of a subsequent record.
> Exceptions in map and reduce implementations can simply be logged, unless 
> they happen under RecordWriter.write().  Cancelling partial output could be 
> hard.  So such output errors will still result in task failure.
> This behaviour should be optional, but enabled by default.  A count of errors 
> per task and job should be maintained and displayed in the web ui.  Perhaps 
> if some percentage of records (>50%?) result in exceptions then the task 
> should fail.  This would stop jobs early that are misconfigured or have buggy 
> code.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to