[ 
https://issues.apache.org/jira/browse/PHOENIX-4560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341612#comment-16341612
 ] 

James Taylor commented on PHOENIX-4560:
---------------------------------------

Attached one more patch to convince myself, [~sergey.soldatov]. I remember when 
I wrote that code I wasn't sure about that check, but left it in because I 
couldn't think of a negative test. The reason it's wrong is that though the 
rows come back in order of the group by key, the second part of the group by 
key (v1) will not necessarily be sorted across all rows if the first part of 
the group by key (ID) is not the same (which it isn't if it's a point lookup).

I've pinged the dev list about precommit a few times, but no one is responding. 
:-(. I bet [~elserj] could fix it (but I don't think he broke it, so that's not 
quite fair either).

> ORDER BY with GROUP BY doesn't work if there is WHERE on pk column
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-4560
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4560
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.0
>            Reporter: Sergey Soldatov
>            Assignee: James Taylor
>            Priority: Major
>             Fix For: 4.14.0
>
>         Attachments: PHOENIX-4560_v1.patch, PHOENIX-4560_v2.patch, 
> PHOENIX-4560_v3.patch
>
>
> It's related to the optimizations for group by that were made in 
> PHOENIX-3451. Test case to reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS VA_TEST(ID VARCHAR NOT NULL PRIMARY KEY, VAL1 
> VARCHAR, VAL2 INTEGER);
> UPSERT INTO VA_TEST VALUES('ABC','aa123', 11);
> UPSERT INTO VA_TEST VALUES('ABD','ba124', 1);
> UPSERT INTO VA_TEST VALUES('ABE','cf125', 13);
> UPSERT INTO VA_TEST VALUES('ABF','dan126', 4);
> UPSERT INTO VA_TEST VALUES('ABG','elf127', 15);
> UPSERT INTO VA_TEST VALUES('ABH','fan128', 6);
> UPSERT INTO VA_TEST VALUES('AAA','get211', 100);
> UPSERT INTO VA_TEST VALUES('AAB','hat212', 7);
> UPSERT INTO VA_TEST VALUES('AAC','aap12', 2);
> UPSERT INTO VA_TEST VALUES('AAD','ball12', 3);
> UPSERT INTO VA_TEST VALUES('AAE','inn2110', 13);
> UPSERT INTO VA_TEST VALUES('AAF','key2112', 40);
> select distinct ID, VAL1, VAL2 from VA_TEST where "ID" in 
> ('ABC','ABD','ABE','ABF','ABG','ABH','AAA', 'AAB', 'AAC','AAD','AAE','AAF') 
> order by VAL1 ASC;
> {noformat}
> Execution result :
> {noformat}
> +------+----------+-------+
> |  ID  |   VAL1   | VAL2  |
> +------+----------+-------+
> | AAA  | get211   | 100   |
> | AAB  | hat212   | 7     |
> | AAC  | aap12    | 2     |
> | AAD  | ball12   | 3     |
> | AAE  | inn2110  | 13    |
> | AAF  | key2112  | 40    |
> | ABC  | aa123    | 11    |
> | ABD  | ba124    | 1     |
> | ABE  | cf125    | 13    |
> | ABF  | dan126   | 4     |
> | ABG  | elf127   | 15    |
> | ABH  | fan128   | 6     |
> +------+----------+-------+
> {noformat}
> Explain plan:
> {noformat}
> +----------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> |                                          PLAN                               
>            | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +----------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> | CLIENT 1-CHUNK 12 ROWS 1908 BYTES PARALLEL 1-WAY POINT LOOKUP ON 12 KEYS 
> OVER VA_TEST  | 1908            | 12             | 1516917709099  |
> |     SERVER AGGREGATE INTO DISTINCT ROWS BY ["ID", "VAL1", "VAL2"]           
>            | 1908            | 12             | 1516917709099  |
> | CLIENT MERGE SORT                                                           
>            | 1908            | 12             | 1516917709099  |
> +----------------------------------------------------------------------------------------+-----------------+----------------+----------------+
> {noformat}
> As we can see there is no client side sort, so it keeps the original order we 
> get from server which is wrong.
> That happens because of OrderPreservingTracker.hasEqualityConstraints decides 
> that WHERE clause on ID column is constant and let us perform the 
> optimization and skip the client side sort.
> [~jamestaylor], [~chenglei] any thoughts? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to