[ https://issues.apache.org/jira/browse/HADOOP-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Runping Qi updated HADOOP-1204: ------------------------------- Attachment: (was: patch-1204.txt) > Re-factor InputFormat/RecordReader related classes > -------------------------------------------------- > > Key: HADOOP-1204 > URL: https://issues.apache.org/jira/browse/HADOOP-1204 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Runping Qi > Assigned To: Runping Qi > > This Jira is the first small step to unify the code related to the > inputformat/record readers for streaming > with the Hadoop main framework. > This Jira does a few things to clean up the related parts in the Hadoop main > framework. > 1. Add a constructor > public LineRecordReader(Configuration job, FileSplit split) > to LineRecordReader. This makes the constructors of both > SequenceFileRecordReader and LineRecordReader > have the same signature. This facilitates to have a factory class to create > various record readers when > we bring in the class readers classes for hadoop streaming to the main > framework. > 2. Implementded next() method using the following newly added protected > method to LineRecordReader class: > protected long readLine() throws IOException { > return LineRecordReader.readLine(in, buffer); > } > This allows the user to easily overwrite the readLine logic to use > different line breaker (e.g. treat '\r' as part of data, not line breaker). > 3. Rename class InputFormatBase to FileInputFormat to better reflect the > functionality of the class. > To keep backward compatible, still keep InputFormatBase class, but make it > deprecated shallow class simply inheriting FileInputFormat . > 4. Change TextInputFormat and SequenceFileFormat to extend FileInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.