[
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306009#comment-15306009
]
James Taylor commented on PHOENIX-258:
--------------------------------------
Thanks for the updates, [~lhofhansl]. That's a good trick with the ordering of
the filters and it makes sense the the more selective the filter, the less
optimization you'd get.
bq. With "reverse" you mean explicit ORDER BY
Yes, like this {{SELECT prefix1 FROM t GROUP BY prefix1 ORDER BY prefix1
DESC}}. In this case, the client-side sort is optimized out and a reverse scan
will be run instead, so your DistinctPrefixFilter will need to generate the
seek hint differently. The way to check on the client is like this (not as I
mentioned before): {{plan.getOrderBy() == OrderBy.REV_ROW_KEY_ORDER_BY}}
bq. I don't quite follow the RCV example
The {{col}} variable is used both for the optimization and to determine how
many slots to use during the running of the DistinctPrefixFilter. In the RVC
example, the {{cols}} would turn out to be just 1, since there'll be single
expression (the RVC expression), but it spans two slots. Also, there are other
weird cases possible, like this:
{code}
SELECT prefix11 FROM t GROUP BY prefix1, TRUNC(prefix1)
{code}
In that case, {{col}} would be set to 2, but it really should be 1. By letting
OrderPreservingTracker track it for you, you'll handle all the weird cases
(both to turn off the usage of the filter and to set the number of slots
correctly)
> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
> Key: PHOENIX-258
> URL: https://issues.apache.org/jira/browse/PHOENIX-258
> Project: Phoenix
> Issue Type: Task
> Reporter: ryang-sfdc
> Assignee: Lars Hofhansl
> Fix For: 4.8.0
>
> Attachments: 258-WIP.txt, 258-v1.txt, 258-v2.txt, 258-v3.txt,
> 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt, 258.txt,
> DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],[" SERVER
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]
>
> We should skip scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)