[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

James Taylor (JIRA) Sun, 29 May 2016 10:52:33 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306009#comment-15306009
 ]


James Taylor commented on PHOENIX-258:
--------------------------------------

Thanks for the updates, [~lhofhansl]. That's a good trick with the ordering of 
the filters and it makes sense the the more selective the filter, the less 
optimization you'd get.

bq. With "reverse" you mean explicit ORDER BY
Yes, like this {{SELECT prefix1 FROM t GROUP BY prefix1 ORDER BY prefix1 
DESC}}. In this case, the client-side sort is optimized out and a reverse scan 
will be run instead, so your DistinctPrefixFilter will need to generate the 
seek hint differently. The way to check on the client is like this (not as I 
mentioned before): {{plan.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY}}

bq. I don't quite follow the RCV example
The {{col}} variable is used both for the optimization and to determine how 
many slots to use during the running of the DistinctPrefixFilter. In the RVC 
example, the {{cols}} would turn out to be just 1, since there'll be  single 
expression (the RVC expression), but it spans two slots. Also, there are other 
weird cases possible, like this:
{code}
SELECT prefix11 FROM t GROUP BY prefix1, TRUNC(prefix1)
{code}
In that case, {{col}} would be set to 2, but it really should be 1. By letting 
OrderPreservingTracker track it for you, you'll handle all the weird cases 
(both to turn off the usage of the filter and to set the number of slots 
correctly)

> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-258
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: ryang-sfdc
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 258-WIP.txt, 258-v1.txt, 258-v2.txt, 258-v3.txt, 
> 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt, 258.txt, 
> DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary 
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],["    SERVER 
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]          
>    
> We should skip scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

Reply via email to