[
https://issues.apache.org/jira/browse/HADOOP-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627549#action_12627549
]
Hadoop QA commented on HADOOP-4010:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12389242/Hadoop-4010_version2.patch
against trunk revision 690641.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac
compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of
release audit warnings.
-1 core tests. The patch failed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/console
This message is automatically generated.
> Chaging LineRecordReader algo so that it does not need to skip backwards in
> the stream
> --------------------------------------------------------------------------------------
>
> Key: HADOOP-4010
> URL: https://issues.apache.org/jira/browse/HADOOP-4010
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Abdul Qadeer
> Assignee: Abdul Qadeer
> Fix For: 0.19.0
>
> Attachments: Hadoop-4010.patch, Hadoop-4010_version2.patch
>
>
> The current algorithm of the LineRecordReader needs to move backwards in the
> stream (in its constructor) to correctly position itself in the stream. So
> it moves back one byte from the start of its split and try to read a record
> (i.e. a line) and throws that away. This is so because it is sure that, this
> line would be taken care of by some other mapper. This algorithm is
> difficult and in-efficient if used for compressed stream where data is coming
> to the LineRecordReader via some codecs. (Although in the current
> implementation, Hadoop does not split a compressed file and only makes one
> split from the start to the end of the file and so only one mapper handles
> it. We are currently working on BZip2 codecs where splitting is possible to
> work with Hadoop. So this proposed change will make it possible to uniformly
> handle plain as well as compressed stream.)
> In the new algorithm, each mapper always skips its first line because it is
> sure that, that line would have been read by some other mapper. So now each
> mapper must finish its reading at a record boundary which is always beyond
> its upper split limit. Due to this change, LineRecordReader does not need to
> move backwards in the stream.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.