[
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mariappan Asokan updated MAPREDUCE-1176:
----------------------------------------
Attachment: mapreduce-1176_v1.patch
> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 0.20.1, 0.20.2
> Environment: Any
> Reporter: BitsOfInfo
> Assignee: Mariappan Asokan
> Attachments: mapreduce-1176_v1.patch, MAPREDUCE-1176-v1.patch,
> MAPREDUCE-1176-v2.patch, MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into
> the mapreduce.lib.input package. These two classes can be used when you need
> to read data from files containing fixed length (fixed width) records. Such
> files have no CR/LF (or any combination thereof), no delimiters etc, but each
> record is a fixed length, and extra data is padded with spaces. The data is
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its
> corresponding FixedLengthRecordReader. When creating a job that specifies
> this input format, the job must have the
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH,
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that
> InputSplits do not contain any partial records since with fixed records there
> is no way to determine where a record begins if that were to occur. Each
> InputSplit passed to the FixedLengthRecordReader will start at the beginning
> of a record, and the last byte in the InputSplit will be the last byte of a
> record. The override of computeSplitSize() delegates to FileInputFormat's
> compute method, and then adjusts the returned split size by doing the
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength)
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed
> files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira