[ 
https://issues.apache.org/jira/browse/HBASE-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13748192#comment-13748192
 ] 

James Taylor commented on HBASE-9316:
-------------------------------------

It doesn't help distinguish null columns, it just makes it more efficient. 
Maybe there's a existing, better way or another, different JIRA to file, but 
let me try to explain in a better way:

Let's say you have a query like this:

SELECT * FROM t WHERE c IS NULL

If you have a regular Phoenix table, then we currently insert an empty key 
value for each row. So to satisfy this query, we can
- project our empty KeyValue cf/cq plus the cf/cq for c.
- in our filter, include the row if it doesn't have the c cf/cq. We know we'll 
get called, since we know that every row has this empty key value.

Another option in Phoenix is to create a VIEW (a read-only table that maps to 
an existing HBase table). In this case, we won't have our empty key value, so 
we have to project everything into the scan and do the same as above.

So the problem stems from the lack of a way to be able to specify a filter that 
gets invoked when a KeyValue is *not* present (or maybe there is a way?).

Instead, if this JIRA is implemented, I was thinking that Phoenix could have a 
MUST_PASS_ALL filter list for each column family. If the first filter finds the 
c KeyValue, then it would filter the row. Otherwise, any of the subsequent 
filters would include the row. This way, you wouldn't necessarily load every 
store file or need to include an empty key value (though that still may be a 
more efficient way to go).

Any ideas?
                
> Use JoinedHeap between MUST_PASS_ALL filters to better leverage essential 
> column family feature 
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9316
>                 URL: https://issues.apache.org/jira/browse/HBASE-9316
>             Project: HBase
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Currently, all column families in a MUST_PASS_ALL filter list are loaded in 
> advance of filtering. Instead, only the essential column family from the 
> first filter should be loaded and then its heap joined with subsequent 
> essential column family from the next filter in the list (probably up to some 
> reasonable/configurable limit).
> One particular Phoenix use case for this is when a SQL query is trying to 
> detect the absence of a KeyValue (though a <column> IS NULL clause). Our 
> workaround for a Phoenix TABLE is to insert a known, empty key value with 
> every row, or for a Phoenix VIEW (mapping to an existing HBase table) to 
> project everything. With this feature, we could instead use a filter per 
> column family and prevent the loading of the corresponding Store in many 
> cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to