[ 
https://issues.apache.org/jira/browse/HIVE-12923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879135#comment-15879135
 ] 

Julian Hyde commented on HIVE-12923:
------------------------------------

I'm thinking of an alternative solution to CALCITE-1069. Currently, as you 
know, an Aggregate with more than one grouping set returns more columns than 
one with only one grouping set. We have been arguing about whether there should 
be 1 extra column (Hive's preference) or N extra columns (Calcite's preference).

My new proposal is that there should be no extra columns. We make GROUPING into 
an aggregate function, and if you want those extra columns you can add calls to 
GROUPING.

If the row type of Aggregate is same regardless of the number of grouping sets, 
it will simplify a bunch of things. For example, it would be easier to write a 
rule that pushes down the Filter "group_id = 2", because we wouldn't have to 
worry about disappearing columns, and whether they are used.

[~hsubramaniyan], [~jcamachorodriguez], Would the new proposal be acceptable to 
Hive?

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> groupby_grouping_sets4.q failure
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12923
>                 URL: https://issues.apache.org/jira/browse/HIVE-12923
>             Project: Hive
>          Issue Type: Sub-task
>          Components: CBO
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>         Attachments: HIVE-12923.1.patch, HIVE-12923.2.patch
>
>
> {code}
> EXPLAIN
> SELECT * FROM
> (SELECT a, b, count(*) from T1 where a < 3 group by a, b with cube) subq1
> join
> (SELECT a, b, count(*) from T1 where a < 3 group by a, b with cube) subq2
> on subq1.a = subq2.a
> {code}
> Stack trace:
> {code}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory.pruneJoinOperator(ColumnPrunerProcFactory.java:1110)
>         at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory.access$400(ColumnPrunerProcFactory.java:85)
>         at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPrunerProcFactory$ColumnPrunerJoinProc.process(ColumnPrunerProcFactory.java:941)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>         at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner$ColumnPrunerWalker.walk(ColumnPruner.java:172)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>         at 
> org.apache.hadoop.hive.ql.optimizer.ColumnPruner.transform(ColumnPruner.java:135)
>         at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:237)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10176)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:229)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>         at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:74)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:239)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:472)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:312)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1168)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1256)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1094)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1129)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1103)
>         at 
> org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:10444)
>         at 
> org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_grouping_sets4(TestCliDriver.java:3313)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to