[
https://issues.apache.org/jira/browse/CALCITE-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043110#comment-16043110
]
Julian Hyde commented on CALCITE-1069:
--------------------------------------
I have created a pull request. Please review. This could potentially break
Hive, so I need a +1 from a developer involved with Hive.
I have endeavored to make this backwards compatible, by still allowing
Aggregate with indicator = true. But it is not well tested. I strongly suggest
that people convert to indicator = false. There are many benefits, for example,
rules that were written for non-grouping sets queries should work with grouping
sets unchanged or with minor modifications. (See CALCITE-461 for the pain that
has caused.)
> In Aggregate, deprecate indicators, and allow GROUPING to be used as an
> aggregate function
> ------------------------------------------------------------------------------------------
>
> Key: CALCITE-1069
> URL: https://issues.apache.org/jira/browse/CALCITE-1069
> Project: Calcite
> Issue Type: Bug
> Reporter: Hari Sankar Sivarama Subramaniyan
> Assignee: Julian Hyde
>
> Grouping sets are currently implemented in Calcite using a bit to indicate
> each
> of the grouping columns. For instance, consider the following group by clause:
> GROUP BY CUBE (a, b)
> The generated Aggregate operator in Calcite will have a row schema consisting
> of [a, b, GROUPING(a), GROUPING(b)], where GROUPING( x ) is a boolean field
> indicator which represents whether x is participating in the group by clause.
> In contrast, Hive's implementation stores a single number corresponding to
> the GROUPING bit vector associated with a row (this is the result of the
> GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the
> row schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].
> This difference is creating a mismatch between Calcite and Hive. As of now,
> we work around this mismatch in the Hive side: we create our own GROUPING_ID
> function applied over those columns. However, we have some issues related to
> predicates pushdown, constant propagation, join project transpose rule
> (HIVE-12923)
> etc., that we need to continue solving as new rules are added to Hive
> optimizer. In short, this is making the code on the Hive side harder and
> harder to maintain.
> This jira is intended to modify the implementation on the Calcite side to
> that we need not make workarounds/hacks in Hive to support Grouping IDs.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)