[ 
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907204#comment-13907204
 ] 

James Taylor commented on PHOENIX-29:
-------------------------------------

Case #3 is an improvement only for mapped VIEWs. We can do this in a follow up 
check-in if you want. With a mapped VIEW, we do not have an empty key value in 
each row. When a user creates a view directly against an HBase table, it's 
read-only and we do not insert this empty key value like we do if a table is 
created. So at query time, we cannot rely on it being there. The reason we add 
this empty key value is for cases like the above:
{code}
select a, b from t where c = 5
{code}
Since we don't have an empty key value, we have to project everything, since 
otherwise we'd miss rows where a and b are null. This is already somewhat more 
expensive than with regular tables - imagine the case where there are five 
column families - we'd open each store for all of them. While for the table 
case, we know we have the empty key value so we only need to project the single 
column family that contains our empty key value. By implementing case #3, we'd 
prune what gets returned back to the client for mapped views (just like we do 
with your patch for tables). But the trick is that we'd need to dynamically 
insert an empty key value in each row before your filter runs. Then, the same 
thing will happen with your filter - there'd be no a or b KV, c would get 
removed, and b/c there's the empty key value, we'd still return back a row to 
the client (which is what we need to have happen).

> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in 
> selecting any other column. The reason is that when you project a column into 
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the 
> column. The only case where this is not necessary is when the column is the 
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient 
> to just do a NEXT instead of a reseek (especially if your KV is the next 
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that 
> need to be returned back to the client which is another advantage we'd get 
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and 
> merge between them and the incoming KVs, using NEXT instead of a reseek. We 
> could potentially use a reseek if the number of columns in the table is 
> beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to