[ 
https://issues.apache.org/jira/browse/CALCITE-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879141#comment-15879141
 ] 

Julian Hyde commented on CALCITE-1069:
--------------------------------------

I'm thinking of an alternative solution. Currently, as you know, an 
{{Aggregate}} with more than one grouping set returns more columns than one 
with only one grouping set. We have been arguing in HIVE-12923 about whether 
there should be 1 extra column (Hive's preference) or N extra columns 
(Calcite's preference).

My new proposal is that there should be no extra columns. We make {{GROUPING}} 
into an aggregate function, and if you want those extra columns you can add 
calls to {{GROUPING}}.

If the row type of {{Aggregate}} is same regardless of the number of grouping 
sets, it will simplify a bunch of things. For example, it would be easier to 
write a rule that pushes down the Filter "group_id = 2", because we wouldn't 
have to worry about disappearing columns, and whether they are used.

> In Aggregate, combine GROUPING columns into one GROUP_ID column
> ---------------------------------------------------------------
>
>                 Key: CALCITE-1069
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1069
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Julian Hyde
>
> Grouping sets are currently implemented in Calcite using a bit to indicate 
> each
> of the grouping columns. For instance, consider the following group by clause:
> GROUP BY CUBE (a, b)
> The generated Aggregate operator in Calcite will have a row schema consisting 
> of [a, b, GROUPING(a), GROUPING(b)], where GROUPING( x ) is a boolean field 
> indicator which represents whether x is participating in the group by clause.
> In contrast, Hive's implementation stores a single number corresponding to 
> the GROUPING bit vector associated with a row (this is the result of the 
> GROUPING_ID function in RDBMS such as MSSQLServer, Oracle, etc). Thus, the 
> row schema of the Aggregate operator is [a, b, GROUPING_ID(a,b)].
> This difference is creating a mismatch between Calcite and Hive. As of now, 
> we work around this mismatch in the Hive side: we create our own GROUPING_ID 
> function applied over those columns. However, we have some issues related to 
> predicates pushdown, constant propagation, join project transpose rule 
> (HIVE-12923)
> etc., that we need to continue solving as new rules are added to Hive 
> optimizer. In short, this is making the code on the Hive side harder and 
> harder to maintain. 
> This jira is intended to modify the implementation on the Calcite side to 
> that we need not make workarounds/hacks in Hive to support Grouping IDs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to