[jira] [Commented] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process

Hadoop QA (JIRA) Tue, 30 Dec 2014 06:16:37 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261114#comment-14261114
 ]


Hadoop QA commented on MAPREDUCE-6208:
--------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12689498/MAPREDUCE-6208.002.patch
  against trunk revision 249cc90.

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 13 new 
Findbugs (version 2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5097//console

This message is automatically generated.

> There should be an input format for MapFiles which can be configured so that 
> only a fraction of the input data is used for the MR process
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: trunk
>            Reporter: Jens Rabe
>            Assignee: Jens Rabe
>              Labels: inputformat, mapfile
>         Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In some cases there are large amounts of data organized in MapFiles, e.g., 
> from previous MapReduce tasks, and only a fraction of the data is to be 
> processed in a MR task. The current approach, as I understand, is to 
> re-organize the data in a suitable partition using folders on HDFS, and only 
> use relevant folders as input paths, and maybe doing some additional 
> filtering in the Map task. However, sometimes the input data cannot be easily 
> partitioned that way. For example, when processing large amounts of measured 
> data where additional data on a time period already in HDFS arrives later.
> There should be an input format that accepts folders with MapFiles, and there 
> should be an option to specify the input key range so that only fitting 
> InputSplits are generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6208) There should be an input format for MapFiles which can be configured so that only a fraction of the input data is used for the MR process

Reply via email to