[
https://issues.apache.org/jira/browse/ACCUMULO-508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Christopher Tubbs resolved ACCUMULO-508.
----------------------------------------
Resolution: Not A Problem
Feature already implemented.
> Multi-range input format
> ------------------------
>
> Key: ACCUMULO-508
> URL: https://issues.apache.org/jira/browse/ACCUMULO-508
> Project: Accumulo
> Issue Type: New Feature
> Components: client
> Reporter: John Vines
> Priority: Major
> Labels: mapreduce, newbie
>
> Maybe for 1.4.1.
> Our current input format will always apply one range (potentially split at
> tablet boundaries) per mapper. This is great for situations where you have a
> few larger ranges. However, there is a potential use case for many small
> ranges. Aside from the problem with a large job configuration (ACCUMULO-507),
> this will result in a LOT of mappers doing very little work. We should have
> an expanded input format which will bundle ranges together to a single
> mapper, ideally while trying to maintain locality. This will optimize jobs
> with a lot of ranges by reducing the amount of mapper overhead involved. I
> think very little will change with the RecordReader. The onus should still go
> to the end user to detect when a range change has been made (via Key change),
> so it will still emit Key/Value pairs, just like the regular input format.
> This could possibly be extended to the whole row input format as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)