[
https://issues.apache.org/jira/browse/ACCUMULO-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951604#comment-13951604
]
Corey J. Nolet commented on ACCUMULO-2553:
------------------------------------------
I've created a GroupedKeyRangePartitioner that will allow the user to specify
multiple splits files along with a group for each one. Currently, it expects a
GroupedKey object to be emitted from the mapper (a GroupedKey is a writable
with a String/Text for a group and an o.a.a.core.data.Key) where it pulls the
splits file out of the configuration based on the given group to determine the
partition. The number of bins are based on the sum of all the split points to
guarantee each file written is done in its own reducer.
This paradigm seems in line with the MultipleOutputs class, where the group in
the GroupedKey can also be linked to the ultimate path for the output file. I
am in the process of testing the MultipleOutputs solution. I think it should be
added to the examples when complete.
> AccumuloFileOutputFormat should be able to support output for multiple tables.
> ------------------------------------------------------------------------------
>
> Key: ACCUMULO-2553
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2553
> Project: Accumulo
> Issue Type: New Feature
> Reporter: Corey J. Nolet
> Priority: Minor
>
> This may not necessarily be something that would require changes in the
> AccumuloFileOutputFormat itself. Perhaps the ability to use it with Hadoop's
> MultipleOutputs is really the solution.
> It would be useful if the user could specify multiple directories where
> RFiles should be placed and have a mechanism for populating the RFiles in the
> necessary directories based on a table name or group name.
--
This message was sent by Atlassian JIRA
(v6.2#6252)