[ 
https://issues.apache.org/jira/browse/HADOOP-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated HADOOP-3828:
-----------------------------------

    Attachment: 3828_v1.patch

This works as follows:-
Write the skipped record (key,value) as SequenceFile.
By default the skipped records are written  in the folder "_skip" in the output 
dir. This is configurable using SkipBadRecords.setSkipOutputPath

-The patch also fixes a corner case by initializing the variable "skipping" in 
TaskInProgress.
-Also it makes some changes in SortedRanges. Made it cloneable and fixed 
serialization of member variable.
-cleanup in MapTask by having a different implementation of RecordReader for 
normal mode (skipping=false)

> Write skipped records' bytes to DFS
> -----------------------------------
>
>                 Key: HADOOP-3828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3828
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Sharad Agarwal
>            Assignee: Sharad Agarwal
>         Attachments: 3828_v1.patch
>
>
> This is an incremental step over HADOOP-153, which provides the base skipping 
> functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to