[
https://issues.apache.org/jira/browse/PIG-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Graham updated PIG-1782:
-----------------------------
Attachment: apply-PIG-1782-patch.sh
PIG-1782_1.patch
Attached are two files, a patch and a script to apply it. A few things to note
about this patch:
* It relies on HBase 0.89.0 or greater and it effectively replaces PIG-1680.
* I've updated HBaseStorage for now. If we want to deprecate that class and
create a new one instead, I can do that.
* I added support for a {{columnPrefix}} option to filter down columns
returned. Proper column prefix functionality though requires HBASE-3550.
* I had to do some hackery in {{setStoreLocation}} and {{getOutputFormat}}
with the conf objects to keep NPEs from being thrown from HBase (see comments
in code). A review of what I'm doing with the conf objects in that part of code
would be good.
* There are still no unit tests for this code, since it's a tricky thing to
test. I have a few simple hbase and pig scripts that I've been using that I
could provide.
> Add ability to load data by column family in HBaseStorage
> ---------------------------------------------------------
>
> Key: PIG-1782
> URL: https://issues.apache.org/jira/browse/PIG-1782
> Project: Pig
> Issue Type: New Feature
> Environment: Java 6, Mac OS X 10.6
> Reporter: Eric Yang
> Assignee: Bill Graham
> Attachments: PIG-1782_1.patch, apply-PIG-1782-patch.sh
>
>
> It would be nice to load all columns in the column family by using short hand
> syntax like:
> {noformat}
> CpuMetrics = load 'hbase://SystemMetrics' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cpu:','-loadKey');
> {noformat}
> Assuming there are columns cpu: sys.0, cpu:sys.1, cpu:user.0, cpu:user.1, in
> cpu column family.
> CpuMetrics would contain something like:
> {noformat}
> (rowKey, cpu:sys.0, cpu:sys.1, cpu:user.0, cpu:user.1)
> {noformat}
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira