[
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817879#comment-13817879
]
Sandy Ryza commented on MAPREDUCE-1176:
---------------------------------------
Thanks for picking this up [~masokan].
A few minor stylistic nits:
{code}
+ private static final Log LOG
+ =
LogFactory.getLog(FixedLengthRecordReader.class);
{code}
{code}
+ CompressionInputStream cIn
+ = codec.createInputStream(fileIn, decompressor);
{code}
{code}
+ public static final String FIXED_RECORD_LENGTH =
+
"fixedlengthinputformat.record.length";
{code)}
The second line should only be indented four spaces past where the text on the
first line starts. There might be a couple more of these to fix.
{code}
+ while(numBytesToRead > 0) {
{code}
Need a space after "while"
{code}
+ if (numBytesRead == -1) // EOF
+ break;
{code}
Curly braces should be used even in one line if blocks. This applies to a
couple other places in the patch as well.
{code}
+ if (! isCompressedInput) {
{code}
No space needed between the exclamation point and the variable name.
{code}
+ return(null == codec);
{code}
Add a space after "return".
Can we rename numRecordsInSplit to numRecordsRemainingInSplit to emphasize that
it gets decremented as we read?
If we're going to include random tests, we should log the seed loudly so that
failures can be reproduced.
If possible, the static block at the beginning of the test class should be
moved to a static method with the JUnit BeforeClass annotation.
Other than these, the patch looks good to me.
> Contribution: FixedLengthInputFormat and FixedLengthRecordReader
> ----------------------------------------------------------------
>
> Key: MAPREDUCE-1176
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.1.0-beta, 2.0.5-alpha
> Environment: Any
> Reporter: BitsOfInfo
> Assignee: Mariappan Asokan
> Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch,
> MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch, mapreduce-1176_v1.patch,
> mapreduce-1176_v2.patch
>
>
> Hello,
> I would like to contribute the following two classes for incorporation into
> the mapreduce.lib.input package. These two classes can be used when you need
> to read data from files containing fixed length (fixed width) records. Such
> files have no CR/LF (or any combination thereof), no delimiters etc, but each
> record is a fixed length, and extra data is padded with spaces. The data is
> one gigantic line within a file.
> Provided are two classes first is the FixedLengthInputFormat and its
> corresponding FixedLengthRecordReader. When creating a job that specifies
> this input format, the job must have the
> "mapreduce.input.fixedlengthinputformat.record.length" property set as follows
> myJobConf.setInt("mapreduce.input.fixedlengthinputformat.record.length",[myFixedRecordLength]);
> OR
> myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH,
> [myFixedRecordLength]);
> This input format overrides computeSplitSize() in order to ensure that
> InputSplits do not contain any partial records since with fixed records there
> is no way to determine where a record begins if that were to occur. Each
> InputSplit passed to the FixedLengthRecordReader will start at the beginning
> of a record, and the last byte in the InputSplit will be the last byte of a
> record. The override of computeSplitSize() delegates to FileInputFormat's
> compute method, and then adjusts the returned split size by doing the
> following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength)
> * fixedRecordLength)
> This suite of fixed length input format classes, does not support compressed
> files.
--
This message was sent by Atlassian JIRA
(v6.1#6144)