Stamatis Zampetakis created HIVE-29449:
------------------------------------------

             Summary: Avoid ANY distribution in HiveSortExchange RelNode
                 Key: HIVE-29449
                 URL: https://issues.apache.org/jira/browse/HIVE-29449
             Project: Hive
          Issue Type: Task
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Currently, we are creating a {{HiveSortExchange}} operator with 
[ANY|https://github.com/apache/calcite/blob/869fe15f36fa5579f2a1cf289958fa763af36a9b/core/src/main/java/org/apache/calcite/rel/RelDistribution.java#L101]
 distribution for queries with a SORT BY clause (and no DISTRIBUTE BY).

+Example+
{code:sql}
explain cbo select * from t1 sort by a
{code}

{noformat}
CBO PLAN:
HiveSortExchange(distribution=[any], collation=[[0]])
  HiveProject(t1.a=[$0], t1.b=[$1])
    HiveTableScan(table=[[default, t1]], table:alias=[t1])
{noformat}

However, ANY is not a valid distribution as per Javadoc and there are even some 
assertions in place to prevent its usage when [creating an 
Exchange|https://github.com/apache/calcite/blob/869fe15f36fa5579f2a1cf289958fa763af36a9b/core/src/main/java/org/apache/calcite/rel/core/Exchange.java#L67]
 operator. The assertion has been circumvented in HIVE-28572 by using the 
HiveRelDistribution wrapper but by doing so we have a plan that is in a 
semi-invalid state. 

An alternative way for representing SORT BY queries (with no DISTRIBUTE BY) 
would be to use a HASH distribution with empty keys. 

{noformat}
CBO PLAN:
HiveSortExchange(distribution=[hash], collation=[[0]])
  HiveProject(t1.a=[$0], t1.b=[$1])
    HiveTableScan(table=[[default, t1]], table:alias=[t1])
{noformat}

The plan would still not reflect what really happens at the physical layer but 
it will contain a valid distribution type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to