nicu marasoiu created MAPREDUCE-5287:
----------------------------------------
Summary: Create a generic InputFormat wrapping any other
InputFormat, to control the number of map tasks
Key: MAPREDUCE-5287
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5287
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: mrv1, performance
Reporter: nicu marasoiu
I wrote a generic InputFormat that wraps any other InputFormat, and creates
CompositeInputSplits to reduce the number of map tasks in a controllable manner
while preserving data locality. A correspondent CompositeRecordReader is
written to iterate through underlying RecordReaders as created by the
underlying InputFormat for each underlying raw split.
An application to this is to group TableSplits when the raw splits are coming
from multiple regions and are filtered with key ranges. We use this to
shard/distribute a time based incremental access to an hbase table.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira