[ https://issues.apache.org/jira/browse/CALCITE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843081#comment-16843081 ]
Feng Zhu commented on CALCITE-3077: ----------------------------------- I am working on this issue now, and look forward to hear some suggestions. > Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect > --------------------------------------------------- > > Key: CALCITE-3077 > URL: https://issues.apache.org/jira/browse/CALCITE-3077 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.20.0 > Reporter: Feng Zhu > Priority: Major > > *Backgroung:* we are building a platform that adopts Calcite to process > (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the > final SQL. For the purpose of handling large volume data, we use the popular > SparkSQL engine to execute the generated SQL query. > However, we found a great part of real-world test cases failed, due to syntax > differences of > *_CUBE/ROLLUP/GROUPING SETS_* clauses. Spark SQL dialect supports only "WITH > ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is > defined as below. > {code:java} > aggregation > : GROUP BY groupingExpressions+=expression (',' > groupingExpressions+=expression)* ( > WITH kind=ROLLUP > | WITH kind=CUBE > | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')? > | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')' > ; > {code} > To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS > clauses in SparkSqlDialect, especially for some complex cases. > {code:java} > group by cube ((a, b), (c, d)) > group by cube(a,b), cube(c,d) > {code} > [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)