[
https://issues.apache.org/jira/browse/ACCUMULO-508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978449#comment-13978449
]
Christopher Tubbs commented on ACCUMULO-508:
--------------------------------------------
Doesn't the AccumuloInputFormat already do this? I'm pretty sure there's an
autoAdjust feature that merges overlapping ranges, splits ranges on tablet
boundaries, and then assigns them. This is done by default. If this feature is
turned off, the ranges are given to mappers exactly as they were given to the
job: 1 range per mapper.
> Multi-range input format
> ------------------------
>
> Key: ACCUMULO-508
> URL: https://issues.apache.org/jira/browse/ACCUMULO-508
> Project: Accumulo
> Issue Type: New Feature
> Components: client
> Reporter: John Vines
> Labels: mapreduce, newbie
>
> Maybe for 1.4.1.
> Our current input format will always apply one range (potentially split at
> tablet boundaries) per mapper. This is great for situations where you have a
> few larger ranges. However, there is a potential use case for many small
> ranges. Aside from the problem with a large job configuration (ACCUMULO-507),
> this will result in a LOT of mappers doing very little work. We should have
> an expanded input format which will bundle ranges together to a single
> mapper, ideally while trying to maintain locality. This will optimize jobs
> with a lot of ranges by reducing the amount of mapper overhead involved. I
> think very little will change with the RecordReader. The onus should still go
> to the end user to detect when a range change has been made (via Key change),
> so it will still emit Key/Value pairs, just like the regular input format.
> This could possibly be extended to the whole row input format as well.
--
This message was sent by Atlassian JIRA
(v6.2#6252)