[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

James Taylor (JIRA) Tue, 31 May 2016 10:20:49 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308159#comment-15308159
 ]


James Taylor commented on PHOENIX-258:
--------------------------------------

Thanks, [~lhofhansl]. Please file a separate bug for the RVC issue - sounds 
like something is wrong here unrelated to your work. The following two queries 
should return the same number of rows:
{code}
SELECT DISTINCT (p1, p2) FROM t GROUP BY (p1, p2)
SELECT DISTINCT p1, p2 FROM t GROUP BY p1, p2
{code}
With the result set you get back for the first query, you'd only be able to 
select the var binary value through resultSet.getBytes(1), while the second one 
would have two separate expressions for each column and maintain the type 
information. It's definitely an edge case.

The logic for calculating the seek next hint for a reverse scan with a variable 
length last field isn't quite correct, though it's another edge case. Let's say 
you have the following three rows: a, a\0xFF, a\0xFF\0xFF, b. If you're at b 
doing a reverse scan, your seek next hint would be a\0xFF which would skip too 
far, skipping a\0xFF\0xFF. You have to pad with some number of arbitrary 0xFF 
bytes (and there'd always be the case of not adding enough, but it's the best 
we can do). The only time this can happen is if the last field is VARBINARY, as 
0xFF isn't a valid byte for VARCHAR or DECIMAL. 

Make sense?

> Use skip scan when SELECT DISTINCT on leading row key column(s)
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-258
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-258
>             Project: Phoenix
>          Issue Type: Task
>            Reporter: ryang-sfdc
>            Assignee: Lars Hofhansl
>             Fix For: 4.8.0
>
>         Attachments: 258-WIP.txt, 258-v1.txt, 258-v10.txt, 258-v2.txt, 
> 258-v3.txt, 258-v4.txt, 258-v5.txt, 258-v6.txt, 258-v7.txt, 258-v8.txt, 
> 258-v9.txt, 258.txt, DistinctFixedPrefixFilter.java, in-clause.png
>
>
> create table(a varchar(32) not null, date date not null constraint pk primary 
> key(a,date))
> [["PLAN"],["CLIENT PARALLEL 94-WAY FULL SCAN OVER foo"],["    SERVER 
> AGGREGATE INTO ORDERED DISTINCT ROWS BY [a]"],["CLIENT MERGE SORT"]]          
>    
> We should skip scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-258) Use skip scan when SELECT DISTINCT on leading row key column(s)

Reply via email to