[jira] [Updated] (CALCITE-3077) Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect

Feng Zhu (JIRA) Fri, 17 May 2019 20:38:16 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Feng Zhu updated CALCITE-3077:
------------------------------
    Description: 
*Background:* we are building a platform that adopts Calcite to process (i.e., 
parse&validate&convert&optimize) SQL queries and then regenerate the final SQL. 
For the purpose of handling large volume data, we use the popular SparkSQL 
engine to execute the generated SQL query.

However, we found a great part of real-world test cases failed, due to syntax 
differences of 
*_CUBE/ROLLUP/GROUPING SETS_* clauses.  Spark SQL dialect supports only "WITH 
ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is defined 
as below.

{code:java}
aggregation
    : GROUP BY groupingExpressions+=expression (',' 
groupingExpressions+=expression)* (
      WITH kind=ROLLUP
    | WITH kind=CUBE
    | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')?
    | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')'
;
{code}

To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS clauses 
in SparkSqlDialect, especially for some complex cases.
{code:java}
group by cube ((a, b), (c, d))
group by cube(a,b), cube(c,d)
{code}

[1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4

  was:
*Backgroung:* we are building a platform that adopts Calcite to process (i.e., 
parse&validate&convert&optimize) SQL queries and then regenerate the final SQL. 
For the purpose of handling large volume data, we use the popular SparkSQL 
engine to execute the generated SQL query.

However, we found a great part of real-world test cases failed, due to syntax 
differences of 
*_CUBE/ROLLUP/GROUPING SETS_* clauses.  Spark SQL dialect supports only "WITH 
ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is defined 
as below.

{code:java}
aggregation
    : GROUP BY groupingExpressions+=expression (',' 
groupingExpressions+=expression)* (
      WITH kind=ROLLUP
    | WITH kind=CUBE
    | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')?
    | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')'
;
{code}

To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS clauses 
in SparkSqlDialect, especially for some complex cases.
{code:java}
group by cube ((a, b), (c, d))
group by cube(a,b), cube(c,d)
{code}

[1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4


> Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect
> ---------------------------------------------------
>
>                 Key: CALCITE-3077
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3077
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.20.0
>            Reporter: Feng Zhu
>            Priority: Major
>
> *Background:* we are building a platform that adopts Calcite to process 
> (i.e., parse&validate&convert&optimize) SQL queries and then regenerate the 
> final SQL. For the purpose of handling large volume data, we use the popular 
> SparkSQL engine to execute the generated SQL query.
> However, we found a great part of real-world test cases failed, due to syntax 
> differences of 
> *_CUBE/ROLLUP/GROUPING SETS_* clauses.  Spark SQL dialect supports only "WITH 
> ROLLUP&CUBE" in the "GROUP BY" clause. The corresponding grammer [1] is 
> defined as below.
> {code:java}
> aggregation
>     : GROUP BY groupingExpressions+=expression (',' 
> groupingExpressions+=expression)* (
>       WITH kind=ROLLUP
>     | WITH kind=CUBE
>     | kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')')?
>     | GROUP BY kind=GROUPING SETS '(' groupingSet (',' groupingSet)* ')'
> ;
> {code}
> To fill this gap, I think we need to rewrite CUBE/ROLLUP/GROUPING SETS 
> clauses in SparkSqlDialect, especially for some complex cases.
> {code:java}
> group by cube ((a, b), (c, d))
> group by cube(a,b), cube(c,d)
> {code}
> [1]https://github.com/apache/spark/blob/master/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CALCITE-3077) Rewrite CUBE&ROLLUP&CUBE queries in SparkSqlDialect

Reply via email to