[
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308159#comment-15308159
]
James Taylor commented on PHOENIX-258:
--------------------------------------
Thanks, [~lhofhansl]. Please file a separate bug for the RVC issue - sounds
like something is wrong here unrelated to your work. The following two queries
should return the same number of rows:
{code}
SELECT DISTINCT (p1, p2) FROM t GROUP BY (p1, p2)
SELECT DISTINCT p1, p2 FROM t GROUP BY p1, p2
{code}
With the result set you get back for the first query, you'd only be able to
select the var binary value through resultSet.getBytes(1), while the second one
would have two separate expressions for each column and maintain the type
information. It's definitely an edge case.
The logic for calculating the seek next hint for a reverse scan with a variable
length last field isn't quite correct, though it's another edge case. Let's say
you have the following three rows: a, a\0xFF, a\0xFF\0xFF, b. If you're at b
doing a reverse scan, your seek next hint would be a\0xFF which would skip too
far, skipping a\0xFF\0xFF. You have to pad with some number of arbitrary 0xFF
bytes (and there'd always be the case of not adding enough, but it's the best
we can do). The only time this can happen is if the last field is VARBINARY, as
0xFF isn't a valid byte for VARCHAR or DECIMAL.
Make sense?
> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
> Key: PHOENIX-258
> URL: https://issues.apache.org/jira/browse/PHOENIX-258
> Project: Phoenix
> Issue Type: Task
> Reporter: ryang-sfdc
> Assignee: Lars Hofhansl
> Fix For: 4.8.0
>
> Attachments: 258-WIP.txt, 258-v1.txt, 258-v10.txt, 258-v2.txt,
> 258-v3.txt, 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt,
> 258-v9.txt, 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],[" SERVER
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]
>
> We should skip scan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)