[jira] Updated: (HADOOP-788) Streaming should use a subclass of TextInputFormat for reading text inputs.

Sanjay Dahiya (JIRA) Wed, 31 Jan 2007 01:53:32 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sanjay Dahiya updated HADOOP-788:
---------------------------------

    Status: Patch Available  (was: Open)

> Streaming should use a subclass of TextInputFormat for reading text inputs.
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-788
>                 URL: https://issues.apache.org/jira/browse/HADOOP-788
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Owen O'Malley
>         Assigned To: Sanjay Dahiya
>         Attachments: Hadoop-788.patch
>
>
> Currently streaming uses a lot of custom code for processing text inputs. 
> I propose:
>  1. Move class LineRecordReader  out of TextInputFormat.
>  2. Make class StreamLineRecordReader extend LineRecordReader.
>  3. StreamLineRecordReader uses LineRecordReader.next to read the lines and 
> splits them on tab to generate a Text/Text key/value pair.
> This will remove a lot of code from streaming and give it automatic support 
> for the compression codecs that the "base" part of Hadoop enjoys. In 
> particular, if the native zlib code is used, it will remove the 2gb limit on 
> compressed files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-788) Streaming should use a subclass of TextInputFormat for reading text inputs.

Reply via email to