[jira] Updated: (HADOOP-1204) Re-factor InputFormat/RecordReader related classes

Runping Qi (JIRA) Tue, 10 Apr 2007 11:09:58 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Runping Qi updated HADOOP-1204:
-------------------------------

    Attachment:     (was: patch-1204.txt)

> Re-factor InputFormat/RecordReader related classes
> --------------------------------------------------
>
>                 Key: HADOOP-1204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1204
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>         Assigned To: Runping Qi
>
> This Jira is the first small step to unify the code related to the 
> inputformat/record readers for streaming 
> with the Hadoop main framework.
> This Jira does a few things to clean up the related parts in the Hadoop main 
> framework.
> 1. Add a constructor 
>        public LineRecordReader(Configuration job, FileSplit split)
> to LineRecordReader. This makes the constructors of both 
> SequenceFileRecordReader and LineRecordReader
> have the same signature. This facilitates to have a factory class to create 
> various record readers when 
> we bring in the class readers classes for hadoop streaming to the main 
> framework.
> 2. Implementded next() method using the following newly added protected 
> method to LineRecordReader class:
>      protected long readLine() throws IOException {
>          return LineRecordReader.readLine(in, buffer);
>      }
>     This allows the user to easily overwrite the readLine logic to use 
> different line breaker (e.g. treat '\r' as part of data, not line breaker).
> 3. Rename class InputFormatBase to FileInputFormat to better reflect the 
> functionality of the class.
> To keep backward compatible, still keep InputFormatBase class, but make it 
> deprecated shallow class simply inheriting FileInputFormat .
> 4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1204) Re-factor InputFormat/RecordReader related classes

Reply via email to