[ 
https://issues.apache.org/jira/browse/ACCUMULO-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676043#comment-13676043
 ] 

Pradeep Gollakota commented on ACCUMULO-391:
--------------------------------------------

This would be a great addition.

We have just started working with Pig (with Accumulo) at my company. The first 
thing that we noticed is that in a lot of situations, where we are joining data 
from one Accumulo table to data from another, we have to first dump the data 
from both tables to HDFS (perhaps using PigStorage), load the data back and 
then join the data. This was because the scan information is encoded in the job 
configuration. So, when Pig uses the MultiInputFormat to scan both tables in 
the same job, only one table ends up getting exported from Accumulo.

If this is completed, we could use the MultiTableInputFormat instead of 
Accumulo(Row)InputFormat to optimize our pig scripts.

Any thoughts on when this would be included?
                
> Multi-table Accumulo input format
> ---------------------------------
>
>                 Key: ACCUMULO-391
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-391
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: John Vines
>            Assignee: William Slacum
>            Priority: Minor
>              Labels: mapreduce,
>         Attachments: multi-table-if.patch, new-multitable-if.patch
>
>
> Just realized we had no MR input method which supports multiple Tables for an 
> input format. I would see it making the table the mapper's key and making the 
> Key/Value a tuple, or alternatively have the Table/Key be the key tuple and 
> stick with Values being the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to