[
https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261114#comment-14261114
]
Hadoop QA commented on MAPREDUCE-6208:
--------------------------------------
{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12689498/MAPREDUCE-6208.002.patch
against trunk revision 249cc90.
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 2 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 javadoc{color}. There were no new javadoc warning messages.
{color:green}+1 eclipse:eclipse{color}. The patch built with
eclipse:eclipse.
{color:red}-1 findbugs{color}. The patch appears to introduce 13 new
Findbugs (version 2.0.3) warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.
Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//console
This message is automatically generated.
> There should be an input format for MapFiles which can be configured so that
> only a fraction of the input data is used for the MR process
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Jens Rabe
> Assignee: Jens Rabe
> Labels: inputformat, mapfile
> Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> In some cases there are large amounts of data organized in MapFiles, e.g.,
> from previous MapReduce tasks, and only a fraction of the data is to be
> processed in a MR task. The current approach, as I understand, is to
> re-organize the data in a suitable partition using folders on HDFS, and only
> use relevant folders as input paths, and maybe doing some additional
> filtering in the Map task. However, sometimes the input data cannot be easily
> partitioned that way. For example, when processing large amounts of measured
> data where additional data on a time period already in HDFS arrives later.
> There should be an input format that accepts folders with MapFiles, and there
> should be an option to specify the input key range so that only fitting
> InputSplits are generated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)