Mike Welch created PIG-3961:
-------------------------------

             Summary: Adding HBaseStorage cell value filters
                 Key: PIG-3961
                 URL: https://issues.apache.org/jira/browse/PIG-3961
             Project: Pig
          Issue Type: New Feature
            Reporter: Mike Welch
            Priority: Minor


Adding three additional server side filtering options when loading data with 
HBaseStorage:

# specified cf:col does not exist
{{-null cf:col}}
# specified cf:col must exist
{{-notnull cf:col}}
# specified cf:col contains the given value
{{-val cf:col=value}}

These are meant to replace (and optimize by reducing data transfer) the 
frequent paradigm in pig of loading data and immediately filtering for a 
specific condition.  For example

data = load 'hbase://mytable' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*') as (cf:map[]) ;
data_with_value = filter data by cf#'col' = 'value' ;

Can be replaced with:

data_with_value = load 'hbase://mytable' using 
org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:*', 'cf:col=value') as 
(cf:map[]) ;




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to