[ https://issues.apache.org/jira/browse/PHOENIX-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913344#comment-13913344 ]
James Taylor commented on PHOENIX-76: ------------------------------------- How about we do the following? - By default, don't use the new filter if multiple column families are involved. - Add a NO_SEEK_TO_COLUMN hint that overrides the default - Fix the essential column family feature in the way you suggested I think post 3.0, even in a point release, we can try to determine if there are no KVs between one KV and another KV in a different CF and use your filter in that case as well. To figure this out, we'd need to sort the List<PColumn> in a given PColumnFamily according the KeyValue.COMPARATOR. Then if the KV is the last one and the next KV is the first one in another CF, then we'd use your filter. Does that make sense? Until we keep stats, we won't really know which is better in general, so our fallback so far has been to introduce hints. > Fix perf regression due to PHOENIX-29 > ------------------------------------- > > Key: PHOENIX-76 > URL: https://issues.apache.org/jira/browse/PHOENIX-76 > Project: Phoenix > Issue Type: Bug > Affects Versions: 3.0.0 > Reporter: James Taylor > Assignee: Anoop Sam John > Fix For: 3.0.0 > > > Many queries got slower as a result of PHOENIX-29. There are a few simple > checks we can do to prevent the adding of the new filter: > - if the query is an aggregate query, as we don't return KVs in this case, so > we're only doing extra processing that we don't need. For this, you can check > statement.isAggregate(). > - if there are multiple column families referenced in the where clause, as > the seek that gets done is better in this case because we'd potentially be > seeking over an entire stores worth of data into a different store. -- This message was sent by Atlassian JIRA (v6.1.5#6160)