[ 
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated PHOENIX-258:
---------------------------------
    Attachment: in-clause.png

Yes indeed. Here's a rudimentary test result -- my leading column has a total 
cardinality of 7 values. These whisker plots show the query times when I do or 
do not provide those 7 values via an 'in' clause. The query itself is a audit 
kind of query, doing a group-by/count of events bucketed per hour. Providing 
the values and thus informing the skip scanner makes for a noticeable 
improvement in overall query time. I ran the queries 10 times each, alternating 
between one than the other.

> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-258
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: ryang-sfdc
>              Labels: gsoc2016
>             Fix For: 4.8.0
>
>         Attachments: 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary 
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],["    SERVER 
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]          
>    
> We should skip scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to