[ 
https://issues.apache.org/jira/browse/PIG-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3961:
------------------------------------

    Status: Open  (was: Patch Available)

Cancelling patch. [~toffer] had a suggestion of supporting the filter language 
used in hbase

[~toffer]'s comments from internal discussion.
I just remembered we already have a language (albeit clunky) to
specify filters which is used in the CLI, Rest, Thrift. Why not fix the
filter pushdown problem holistically and
support the language in HBaseStorage? Should be easy enough as the
language is encapsulated in the ParseFilter class.
Here is sample usage documentation:
http://hbase.apache.org/book/thrift.html
Here is a patch were are contributing back to the community as reference
for implementation.
https://issues.apache.org/jira/secure/attachment/12643859/HBASE-9345_trunk
.patch
One small thing to keep in mind is that you would need to use
Bytes.toStringBinary/toBytesBinary before you pass things to the filter
so you get to support specify byte arrays.

> Adding HBaseStorage cell value filters
> --------------------------------------
>
>                 Key: PIG-3961
>                 URL: https://issues.apache.org/jira/browse/PIG-3961
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Mike Welch
>            Assignee: Mike Welch
>            Priority: Minor
>             Fix For: 0.14.0
>
>         Attachments: filters-patch.v2.diff
>
>
> Adding three additional server side filtering options when loading data with 
> HBaseStorage:
> # specified cf:col does not exist
> {{-null cf:col}}
> # specified cf:col must exist
> {{-notnull cf:col}}
> # specified cf:col contains the given value
> {{-val cf:col=value}}
> These are meant to replace (and optimize by reducing data transfer) the 
> frequent paradigm in pig of loading data and immediately filtering for a 
> specific condition.  For example
> data = load 'hbase://mytable' using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
> data_with_value = filter data by cf#'col' = 'value' ;
> Can be replaced with:
> data_with_value = load 'hbase://mytable' using 
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', '-val cf:col=value') 
> as (cf:map[]) ;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to