Hari Sankar Sivarama Subramaniyan created CALCITE-1069:
----------------------------------------------------------
Summary: Grouping ID mplementation to support Hive
Key: CALCITE-1069
URL: https://issues.apache.org/jira/browse/CALCITE-1069
Project: Calcite
Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Julian Hyde
Grouping sets are currently implemented in Calcite using a bit to indicate each
of the grouping columns. For instance, consider the following group by clause:
GROUP BY CUBE (a, b)
The generated Aggregate operator in Calcite will have a row schema consisting
of [a, b, GROUPING(a), GROUPING(b)], where GROUPING(x) is a boolean field
indicator which represents whether x is participating
in the group by clause.
In contrast, Hive's implementation stores a single number corresponding to the
GROUPING bit vector associated with a row (this is the result of the
GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the row
schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].
This difference is creating a mismatch between Calcite and Hive. As of now, we
work around this mismatch in the Hive side: we create our own GROUPING_ID
function applied over those
columns. However, we have some issues related to predicates pushdown, constant
propagation, join project transpose rule (HIVE-12923)
etc., that we need to continue solving as e.g. new rules are added to our
optimizer. In short, this is making the code on the Hive side harder and harder
to maintain.
This jira is intended to modify the implementation on the
Calcite side to that we need not make workarounds/hacks in Hive to support
Grouping IDs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)