[ 
https://issues.apache.org/jira/browse/PHOENIX-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13912600#comment-13912600
 ] 

Anoop Sam John commented on PHOENIX-76:
---------------------------------------

bq.if there are multiple column families referenced in the where clause, 
May be not always. Say we have ref to 2 cfs in where and we select columns 
(other or other+where cols) from both of these, then we might not have got perf 
degrade.

Going through the table DDL and queries which got major degrade, I think I know 
the issue..  I have not seen all the case but checked the cases of big degrade

This is the no1 issue
TABLE_6CF
CREATE TABLE IF NOT EXISTS $TABLE (K1 CHAR(1) NOT NULL, K2 VARCHAR NOT NULL, 
CF1.A INTEGER, CF2.B INTEGER, CF3.C INTEGER, CF4.D INTEGER, CF5.E INTEGER, 
CF6.F INTEGER CONSTRAINT PK PRIMARY KEY (K1,K2)) SPLIT ON ('B','C','D');

select a,b,c,d,e,f from TABLE_6CF where B>1000 and B<2000 and f>1000 and f<2000
before : 1.31 sec
after patch : 8.09 sec

Because of the new filter added into the FilterList  we can not make use of the 
essential family based optimization!
So  the new Filter what is added by PHOENIX-29 should take care of this 
isFamilyEssential(byte[] name) also.   Well easy to add.

One by one going through other cases also.

> Fix perf regression due to PHOENIX-29
> -------------------------------------
>
>                 Key: PHOENIX-76
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-76
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: James Taylor
>            Assignee: Anoop Sam John
>             Fix For: 3.0.0
>
>
> Many queries got slower as a result of PHOENIX-29. There are a few simple 
> checks we can do to prevent the adding of the new filter:
> - if the query is an aggregate query, as we don't return KVs in this case, so 
> we're only doing extra processing that we don't need. For this, you can check 
> statement.isAggregate().
> - if there are multiple column families  referenced in the where clause, as 
> the seek that gets done is better in this case because we'd potentially be 
> seeking over an entire stores worth of data into a different store.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to