[jira] [Commented] (CALCITE-1069) In Aggregate, deprecate indicators, and allow GROUPING to be used as an aggregate function

Julian Hyde (JIRA) Thu, 08 Jun 2017 10:42:45 -0700

    [ 
https://issues.apache.org/jira/browse/CALCITE-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043110#comment-16043110
 ]


Julian Hyde commented on CALCITE-1069:
--------------------------------------

I have created a pull request. Please review. This could potentially break 
Hive, so I need a +1 from a developer involved with Hive.

I have endeavored to make this backwards compatible, by still allowing 
Aggregate with indicator = true. But it is not well tested. I strongly suggest 
that people convert to indicator = false. There are many benefits, for example, 
rules that were written for non-grouping sets queries should work with grouping 
sets unchanged or with minor modifications. (See CALCITE-461 for the pain that 
has caused.)

> In Aggregate, deprecate indicators, and allow GROUPING to be used as an 
> aggregate function
> ------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1069
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1069
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Julian Hyde
>
> Grouping sets are currently implemented in Calcite using a bit to indicate 
> each
> of the grouping columns. For instance, consider the following group by clause:
> GROUP BY CUBE (a, b)
> The generated Aggregate operator in Calcite will have a row schema consisting 
> of [a, b, GROUPING(a), GROUPING(b)], where GROUPING( x ) is a boolean field 
> indicator which represents whether x is participating in the group by clause.
> In contrast, Hive's implementation stores a single number corresponding to 
> the GROUPING bit vector associated with a row (this is the result of the 
> GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the 
> row schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].
> This difference is creating a mismatch between Calcite and Hive. As of now, 
> we work around this mismatch in the Hive side: we create our own GROUPING_ID 
> function applied over those columns. However, we have some issues related to 
> predicates pushdown, constant propagation, join project transpose rule 
> (HIVE-12923)
> etc., that we need to continue solving as new rules are added to Hive 
> optimizer. In short, this is making the code on the Hive side harder and 
> harder to maintain. 
> This jira is intended to modify the implementation on the Calcite side to 
> that we need not make workarounds/hacks in Hive to support Grouping IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (CALCITE-1069) In Aggregate, deprecate indicators, and allow GROUPING to be used as an aggregate function

Reply via email to