[ https://issues.apache.org/jira/browse/PHOENIX-5148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chenglei updated PHOENIX-5148: ------------------------------ Description: Given a table {code:java} create table test ( pk1 varchar not null , pk2 varchar not null, pk3 varchar not null, v1 varchar, v2 varchar, CONSTRAINT TEST_PK PRIMARY KEY ( pk1, pk2, pk3 )) {code} Consider following three cases: *1. OrderBy of ClientScanPlan* for sql: {code:java} select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 {code} Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out. *2. GroupBy of ClientAggregatePlan* for sql : {code:java} select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 {code} Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false. *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* for sql: {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time. Thus, both 1) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b{code} and 2) {code:java} SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b{code} should avoid doing an extra order-by after the sort-merge-join operation. All the above three cases are caused by the same problem that the {{OrderPreservingTracker}} relies solely on primary keys for inferring alignment between the target \{{OrderByExpression}}s and the source sortedness. was: Given a table {code:java} create table test ( pk1 varchar not null , pk2 varchar not null, pk3 varchar not null, v1 varchar, v2 varchar, CONSTRAINT TEST_PK PRIMARY KEY ( pk1, pk2, pk3 )) {code} Consider following three cases: *1. OrderBy of ClientScanPlan* for sql: {code:java} select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 {code} Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , but unfortunately it is not compiled out. *2. GroupBy of ClientAggregatePlan* for sql : {code:java} select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 {code} Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return false. *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* for sql: {code}SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) at the same time. Thus, both 1) {code}SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, T1.b{code} and 2) {code}SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, T2.b{code} should avoid doing an extra order-by after the sort-merge-join operation. All the above three cases are caused by the same problem that the {{OrderPreservingTracker}} relies solely on primary keys for inferring alignment between the target {{OrderByExpression}}s and the source sortedness. > Improve OrderPreservingTracker to optimize OrderBy/GroupBy for ClientScanPlan > and ClientAggregatePlan > ----------------------------------------------------------------------------------------------------- > > Key: PHOENIX-5148 > URL: https://issues.apache.org/jira/browse/PHOENIX-5148 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.1 > Reporter: chenglei > Priority: Major > > Given a table > {code:java} > create table test ( > pk1 varchar not null , > pk2 varchar not null, > pk3 varchar not null, > v1 varchar, > v2 varchar, > CONSTRAINT TEST_PK PRIMARY KEY ( > pk1, > pk2, > pk3 )) > {code} > Consider following three cases: > *1. OrderBy of ClientScanPlan* > for sql: > {code:java} > select v1 from (select v1,v2,pk3 from test t where pk1 = '6' order by > t.v2,t.pk3,t.v1 limit 10) a order by v2,pk3 > {code} > Obviously, the outer {{OrderBy}} "order by v2,pk3" should be compiled out > because it matchs the inner query {{OrderBy}} "order by t.v2,t.pk3,t.v1" , > but unfortunately it is not compiled out. > *2. GroupBy of ClientAggregatePlan* > for sql : > {code:java} > select v1 from (select v1,pk2,pk1 from test t where pk1 = '6' order by > t.pk2,t.v1,t.pk1 limit 10) a group by pk2,v1 > {code} > Obviously, the outer {{GroupBy}} "group by pk2,v1" should be orderPreserving > because it matchs the inner query {{OrderBy}} "order by t.pk2,t.v1,t.pk1" , > but unfortunately the {{isOrderPreserving()}} of outer {{GroupBy}} return > false. > *3. OrderBy of SortMergeJoinPlan(from PHOENIX-4618)* > for sql: > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b{code} > The result of the sort-merge-join is sorted on (T1.a, T1.b) and (T2.a, T2.b) > at the same time. > Thus, both 1) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T1.a, > T1.b{code} > and 2) > {code:java} > SELECT * FROM T1 JOIN T2 ON T1.a = T2.a and T1.b = T2.b ORDER BY T2.a, > T2.b{code} > should avoid doing an extra order-by after the sort-merge-join operation. > > All the above three cases are caused by the same problem that the > {{OrderPreservingTracker}} relies solely on primary keys for inferring > alignment between the target \{{OrderByExpression}}s and the source > sortedness. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)