Stamatis Zampetakis created HIVE-29449:
------------------------------------------
Summary: Avoid ANY distribution in HiveSortExchange RelNode
Key: HIVE-29449
URL: https://issues.apache.org/jira/browse/HIVE-29449
Project: Hive
Issue Type: Task
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
Currently, we are creating a {{HiveSortExchange}} operator with
[ANY|https://github.com/apache/calcite/blob/869fe15f36fa5579f2a1cf289958fa763af36a9b/core/src/main/java/org/apache/calcite/rel/RelDistribution.java#L101]
distribution for queries with a SORT BY clause (and no DISTRIBUTE BY).
+Example+
{code:sql}
explain cbo select * from t1 sort by a
{code}
{noformat}
CBO PLAN:
HiveSortExchange(distribution=[any], collation=[[0]])
HiveProject(t1.a=[$0], t1.b=[$1])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
{noformat}
However, ANY is not a valid distribution as per Javadoc and there are even some
assertions in place to prevent its usage when [creating an
Exchange|https://github.com/apache/calcite/blob/869fe15f36fa5579f2a1cf289958fa763af36a9b/core/src/main/java/org/apache/calcite/rel/core/Exchange.java#L67]
operator. The assertion has been circumvented in HIVE-28572 by using the
HiveRelDistribution wrapper but by doing so we have a plan that is in a
semi-invalid state.
An alternative way for representing SORT BY queries (with no DISTRIBUTE BY)
would be to use a HASH distribution with empty keys.
{noformat}
CBO PLAN:
HiveSortExchange(distribution=[hash], collation=[[0]])
HiveProject(t1.a=[$0], t1.b=[$1])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
{noformat}
The plan would still not reflect what really happens at the physical layer but
it will contain a valid distribution type.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)