Contribution: FixedLengthInputFormat and FixedLengthRecordReader
----------------------------------------------------------------

                 Key: MAPREDUCE-1176
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
    Affects Versions: 0.20.1
         Environment: Any
            Reporter: BitsOfInfo
            Priority: Minor


Hello,
I would like to contribute the following two classes for incorporation into the 
mapreduce.lib.input package. These two classes can be used when you need to 
read data from files containing fixed length (fixed width) records. Such files 
have no CR/LF (or any combination thereof), no delimiters etc, but each record 
is a fixed length, and extra data is padded with spaces. The data is one 
gigantic line within a file.

Provided are two classes first is the FixedLengthInputFormat and its 
corresponding FixedLengthRecordReader. When creating a job that specifies this 
input format, the job must have the 
"mapreduce.input.fixedlengthinputformat.record.length" property set as follows

myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);

OR

myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
[myFixedRecordLength]);

This input format overrides computeSplitSize() in order to ensure that 
InputSplits do not contain any partial records since with fixed records there 
is no way to determine where a record begins if that were to occur. Each 
InputSplit passed to the FixedLengthRecordReader will start at the beginning of 
a record, and the last byte in the InputSplit will be the last byte of a 
record. The override of computeSplitSize() delegates to FileInputFormat's 
compute method, and then adjusts the returned split size by doing the 
following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) * 
fixedRecordLength)

This suite of fixed length input format classes, does not support compressed 
files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to