[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

Mujtaba Chohan (JIRA) Thu, 21 Jul 2016 17:12:34 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388655#comment-15388655
 ]


Mujtaba Chohan commented on PHOENIX-258:
----------------------------------------

[~lhofhansl] Tested on a table with 300M rows 230GB split in 78 regions on 16 
node cluster. {{SELECT DISTINCT ORGANIZATION_ID FROM T}} with 2 distinct values 
for leading row key takes *90* seconds compared to 110 seconds for full scan 
count * aggregation.

{code}
Explain for select distinct: 
CLIENT 78-CHUNK  PARALLEL 78-WAY FULL SCAN OVER T 
SERVER FILTER BY FIRST KEY ONLY 
SERVER DISTINCT PREFIX FILTER OVER ORGANIZATION_ID 
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ORGANIZATION_ID  
CLIENT MERGE SORT    
{code}

> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-258
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: ryang-sfdc
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 258-WIP.txt, 258-v1.txt, 258-v10.txt, 258-v11.txt, 
> 258-v12.txt, 258-v13.txt, 258-v14.txt, 258-v15.txt, 258-v16.txt, 258-v17.txt, 
> 258-v2.txt, 258-v3.txt, 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 
> 258-v8.txt, 258-v9.txt, 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary 
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],["    SERVER 
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]          
>    
> We should skip scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

Reply via email to