[ 
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906564#comment-13906564
 ] 

Anoop Sam John edited comment on PHOENIX-29 at 2/20/14 3:33 AM:
----------------------------------------------------------------

bq.Why not add this filter in the beginning?
No Lars we can not.  Suppose the below query.
Select name, address from people where age=25;
Now the new Filter will contain only these 2 columns (name , address) and all 
other KVs will be filtered out.  For the condition we will have SCVF which then 
comes as 2nd Filter. As the 1st filter filters out age KVs, the SCVF will not 
get condition column KV.

bq.keep your filter at the end like you had it before and make the ExplainTable 
more forgiving of the FilterList order. It's better to have the PageFilter 
before yours so that it reduces the number of rows over which you're mucking 
with the KeyValues.
I think yes I can keep it at the end. Whatever I was thinking of making 
PageFilter at the end might not be an issue I guess.  What I thought is any 
filter which depends on number of rows can be better at the end. But for this 
particular combination of ColumnProjectionFilter and then PageFilter looks no 
problem..   Lars can correct if I am wrong.
PageFilter uses filterAllRemaining  to denote no more scan is needed. So even 
if it is not at the end no much of a diff I feel.

Still I am +1 for James suggestion for keeping it at the end as in patch V1.  I 
will do that change and once UTs pass will post the new version

Dealing with combination of Filters in FilterList is tricky. I wonder how 
easy/difficult it is for the users.  With out having some knowledge on the 
internal code flow, things can go wrong some times . :(  



was (Author: anoop.hbase):
bq.Why not add this filter in the beginning?
No Lars we can not.  Suppose the below query.
Select name, address from people where age=25;
Now the new Filter will contain only these 2 columns (name , address) and all 
other KVs will be filtered out.  For the condition we will have SCVF which then 
comes as 2nd Filter. As the 1st filter filters out age KVs, the SCVF will not 
get condition column KV.

bq.keep your filter at the end like you had it before and make the ExplainTable 
more forgiving of the FilterList order. It's better to have the PageFilter 
before yours so that it reduces the number of rows over which you're mucking 
with the KeyValues.
I think yes I can keep it at the end. Whatever I was thinking of making 
PageFilter at the end might not be an issue I guess.  What I thought is any 
filter which depends on number of rows can be better at the end. But for this 
particular combination of ColumnProjectionFilter and then PageFilter looks no 
problem..   Lars can correct if I am wrong.
PageFilter uses filterAllRemaining  to denote no more scan is needed. So even 
if it is not at the end no much of a diff I feel.

Still I am +1 for James suggestion for keeping it at the end as in patch V1.  I 
will do that change and once UTs pass will post the new version

Dealing with combination of Filters in FilterList is tricky. I wonder how 
easy/difficult it is for the users.  With out having some knowledge on the 
internal code flow, things can go wrong some thimes . :(  


> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in 
> selecting any other column. The reason is that when you project a column into 
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the 
> column. The only case where this is not necessary is when the column is the 
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient 
> to just do a NEXT instead of a reseek (especially if your KV is the next 
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that 
> need to be returned back to the client which is another advantage we'd get 
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and 
> merge between them and the incoming KVs, using NEXT instead of a reseek. We 
> could potentially use a reseek if the number of columns in the table is 
> beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to