[ 
https://issues.apache.org/jira/browse/HADOOP-3666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610271#action_12610271
 ] 

Joydeep Sen Sarma commented on HADOOP-3666:
-------------------------------------------

sorry - i didn't see Owen's comment earlier: - yes #2 was the intent.

I agree that this causes the absence of any policy hooks (max number of errors 
tolerable, reporting errors out to JobTracker.) - and i think that's leading to 
Sharad's comment. One problem with his proposal is how will the RecordReader 
differentiate between the first and second next() call?

One simple method to integrate this with the policy framework would be for the 
recordreader to export an error counter (as an additional interface). The TT/JT 
can make go/no-go decision based on the number of errors they observe. By 
default a recordreader would attempt to skip bad records, while raising the 
error count - but based on job spec - this may lead to job being aborted by 
Hadoop.

One wrinkle is that the number of errors is not the same as the amount of data 
skipped. That may be a design point (report bytes skipped versus # of bad 
records).


> SequenceFile RecordReader should skip bad records
> -------------------------------------------------
>
>                 Key: HADOOP-3666
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3666
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>
> Currently a bad record in a sequencefile leads to entire job being failed. 
> the best workaround is to skip an errant file manually (by looking at what 
> map task failed).  This is a sucky option because it's manual and because one 
> should be able to skip a sequencefile block (instead of entire file).
> While we don't see this often (and i don't know why this corruption happened) 
> - here's an example stack:
> Status : FAILED java.lang.NegativeArraySizeException
>       at org.apache.hadoop.io.BytesWritable.setCapacity(BytesWritable.java:96)
>       at org.apache.hadoop.io.BytesWritable.setSize(BytesWritable.java:75)
>       at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:130)
>       at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1640)
>       at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1712)
>       at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:79)
>       at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:176)
> Ideally the recordreader should just skip the entire chunk if it gets an 
> unrecoverable error while reading.
> This was the consensus in hadoop-153 as well (that data corruptions should be 
> handled by recordreaders) and hadoop-3144 did something similar for 
> textinputformat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to