[jira] [Updated] (SPARK-31663) Grouping sets with having clause returns the wrong result

Dongjoon Hyun (Jira) Fri, 08 May 2020 12:03:26 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-31663:
----------------------------------
    Description: 
Grouping sets with having clause returns the wrong result when the condition of 
having contained conflicting naming. See the below example:
{code:java}
select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
SETS ((b), (a, b)) having b > 10{code}
The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
right result should be
{code:java}
+---+
|  b|
+---+
|  2|
|  2|
+---+{code}
instead of an empty result.

The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE 
and ROLLUP have the same issue.

Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
2017.

 

For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
{code:java}
spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 10;
Time taken: 0.194 seconds

hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
2
Time taken: 1.605 seconds, Fetched: 1 row(s) {code}

  was:
Grouping sets with having clause returns the wrong result when the condition of 
having contained conflicting naming. See the below example:
{code:java}
select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
SETS ((b), (a, b)) having b > 10{code}
The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
right result should be
{code:java}
+---+
|  b|
+---+
|  2|
|  2|
+---+{code}
instead of an empty result.

The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
Filter(..., Agg(...)) and resolved these two parts in different rules. The CUBE 
and ROLLUP have the same issue.

Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
2017.


> Grouping sets with having clause returns the wrong result
> ---------------------------------------------------------
>
>                 Key: SPARK-31663
>                 URL: https://issues.apache.org/jira/browse/SPARK-31663
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0
>            Reporter: Yuanjian Li
>            Priority: Major
>              Labels: correctness
>
> Grouping sets with having clause returns the wrong result when the condition 
> of having contained conflicting naming. See the below example:
> {code:java}
> select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING 
> SETS ((b), (a, b)) having b > 10{code}
> The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the 
> right result should be
> {code:java}
> +---+
> |  b|
> +---+
> |  2|
> |  2|
> +---+{code}
> instead of an empty result.
> The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as 
> Filter(..., Agg(...)) and resolved these two parts in different rules. The 
> CUBE and ROLLUP have the same issue.
> Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server 
> 2017.
>  
> For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested.
> {code:java}
> spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > 
> 10;
> Time taken: 0.194 seconds
> hive> select sum(a) as b from t group by b grouping sets(b) having b > 10;
> 2
> Time taken: 1.605 seconds, Fetched: 1 row(s) {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31663) Grouping sets with having clause returns the wrong result

Reply via email to