Hari Sankar Sivarama Subramaniyan created CALCITE-1069:
----------------------------------------------------------

             Summary: Grouping ID mplementation to support Hive
                 Key: CALCITE-1069
                 URL: https://issues.apache.org/jira/browse/CALCITE-1069
             Project: Calcite
          Issue Type: Bug
            Reporter: Hari Sankar Sivarama Subramaniyan
            Assignee: Julian Hyde


Grouping sets are currently implemented in Calcite using a bit to indicate each
of the grouping columns. For instance, consider the following group by clause:

GROUP BY CUBE (a, b)

The generated Aggregate operator in Calcite will have a row schema consisting 
of [a, b, GROUPING(a), GROUPING(b)], where GROUPING(x) is a boolean field 
indicator which represents whether x is participating
in the group by clause.

In contrast, Hive's implementation stores a single number corresponding to the
GROUPING bit vector associated with a row (this is the result of the 
GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the row 
schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].

This difference is creating a mismatch between Calcite and Hive. As of now, we 
work around this mismatch in the Hive side: we create our own GROUPING_ID 
function applied over those
columns. However, we have some issues related to predicates pushdown, constant 
propagation, join project transpose rule (HIVE-12923)
etc., that we need to continue solving as e.g. new rules are added to our 
optimizer. In short, this is making the code on the Hive side harder and harder 
to maintain. 

This jira is intended to modify the implementation on the
Calcite side to that we need not make workarounds/hacks in Hive to support 
Grouping IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to