[jira] [Commented] (ACCUMULO-2553) AccumuloFileOutputFormat should be able to support output for multiple tables.

Corey J. Nolet (JIRA) Fri, 28 Mar 2014 16:53:58 -0700

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13951604#comment-13951604
 ]


Corey J. Nolet commented on ACCUMULO-2553:
------------------------------------------

I've created a GroupedKeyRangePartitioner that will allow the user to specify 
multiple splits files along with a group for each one. Currently, it expects a 
GroupedKey object to be emitted from the mapper (a GroupedKey is a writable 
with a String/Text for a group and an o.a.a.core.data.Key) where it pulls the 
splits file out of the configuration based on the given group to determine the 
partition. The number of bins are based on the sum of all the split points to 
guarantee each file written is done in its own reducer.

This paradigm seems in line with the MultipleOutputs class, where the group in 
the GroupedKey can also be linked to the ultimate path for the output file. I 
am in the process of testing the MultipleOutputs solution. I think it should be 
added to the examples when complete.

> AccumuloFileOutputFormat should be able to support output for multiple tables.
> ------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-2553
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2553
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Corey J. Nolet
>            Priority: Minor
>
> This may not necessarily be something that would require changes in the 
> AccumuloFileOutputFormat itself. Perhaps the ability to use it with Hadoop's 
> MultipleOutputs is really the solution.
> It would be useful if the user could specify multiple directories where 
> RFiles should be placed and have a mechanism for populating the RFiles in the 
> necessary directories based on a table name or group name. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (ACCUMULO-2553) AccumuloFileOutputFormat should be able to support output for multiple tables.

Reply via email to