[
https://issues.apache.org/jira/browse/CRUNCH-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13714869#comment-13714869
]
Chao Shi commented on CRUNCH-212:
---------------------------------
Hi Reid, I haven't thought on that thoroughly yet.
bq. - setting up the partitioning to match regions on an existing HBase table
I think we have to set up a TotalOrderPartitioner. The partition boundaries are
determined from a scan on ".META.".
bq. - handling multiple column families
I think we can take PCollection<KeyValue> as input from user, then divide them
into multiple PCollection<KeyValue> by their families. Then sort per family and
write them to HFile targets. This requires user to explicitly tell use what are
the column families are used, as crunch cannot determine how many ways of
output at runtime. This approach looks more "crunch-style". :)
Any suggestions are welcome.
> Need target wrapper for HFileOuptutFormat
> -----------------------------------------
>
> Key: CRUNCH-212
> URL: https://issues.apache.org/jira/browse/CRUNCH-212
> Project: Crunch
> Issue Type: Improvement
> Components: IO
> Reporter: Chao Shi
> Attachments: crunch-212-draft.patch
>
>
> I need to import data to hbase from MR. I found HFileOutputFormat is ~5x more
> efficient than HTableOutputFormat. So maybe we need a target wrapper for it.
> Future more, is it possible to call HBase to load it automatically after
> HFiles are generated?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira