[
https://issues.apache.org/jira/browse/MAPREDUCE-6208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jens Rabe updated MAPREDUCE-6208:
---------------------------------
Assignee: Jens Rabe
Status: In Progress (was: Patch Available)
[MAPREDUCE-6208.002.patch|https://issues.apache.org/jira/secure/attachment/12689498/MAPREDUCE-6208.002.patch]
adds some @SuppressWarnings("unchecked") and explanations why the unchecked
casts should be OK there. Should remove the java compiler warnings.
> There should be an input format for MapFiles which can be configured so that
> only a fraction of the input data is used for the MR process
> -----------------------------------------------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-6208
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6208
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Affects Versions: trunk
> Reporter: Jens Rabe
> Assignee: Jens Rabe
> Labels: inputformat, mapfile
> Attachments: MAPREDUCE-6208.001.patch, MAPREDUCE-6208.002.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> In some cases there are large amounts of data organized in MapFiles, e.g.,
> from previous MapReduce tasks, and only a fraction of the data is to be
> processed in a MR task. The current approach, as I understand, is to
> re-organize the data in a suitable partition using folders on HDFS, and only
> use relevant folders as input paths, and maybe doing some additional
> filtering in the Map task. However, sometimes the input data cannot be easily
> partitioned that way. For example, when processing large amounts of measured
> data where additional data on a time period already in HDFS arrives later.
> There should be an input format that accepts folders with MapFiles, and there
> should be an option to specify the input key range so that only fitting
> InputSplits are generated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)