[ 
https://issues.apache.org/jira/browse/SPARK-31334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu resolved SPARK-31334.
-------------------------------
    Resolution: Fixed

> Use agg column in Having clause behave different with column type 
> ------------------------------------------------------------------
>
>                 Key: SPARK-31334
>                 URL: https://issues.apache.org/jira/browse/SPARK-31334
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: angerszhu
>            Priority: Major
>
> {code:java}
> ```
> test("xxxxxxxx") {
>     Seq(
>       (1, 3),
>       (2, 3),
>       (3, 6),
>       (4, 7),
>       (5, 9),
>       (6, 9)
>     ).toDF("a", "b").createOrReplaceTempView("testData")
>     val x = sql(
>       """
>         | SELECT b, sum(a) as a
>         | FROM testData
>         | GROUP BY b
>         | HAVING sum(a) > 3
>       """.stripMargin)
>     x.explain()
>     x.show()
>   }
> [info] - xxxxxxxx *** FAILED *** (508 milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Resolved attribute(s) a#184 
> missing from a#180,b#181 in operator !Aggregate [b#181], [b#181, 
> sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]. 
> Attribute(s) with the same name appear in the operation: a. Please check if 
> the right attribute(s) are used.;;
> [info] Project [b#181, a#184]
> [info] +- Filter (sum(a#184)#188 > cast(3 as double))
> [info]    +- !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, 
> sum(a#184) AS sum(a#184)#188]
> [info]       +- SubqueryAlias `testdata`
> [info]          +- Project [_1#177 AS a#180, _2#178 AS b#181]
> [info]             +- LocalRelation [_1#177, _2#178]
> ```
> ```
> test("xxxxxxxx") {
>     Seq(
>       ("1", "3"),
>       ("2", "3"),
>       ("3", "6"),
>       ("4", "7"),
>       ("5", "9"),
>       ("6", "9")
>     ).toDF("a", "b").createOrReplaceTempView("testData")
>     val x = sql(
>       """
>         | SELECT b, sum(a) as a
>         | FROM testData
>         | GROUP BY b
>         | HAVING sum(a) > 3
>       """.stripMargin)
>     x.explain()
>     x.show()
>   }
> == Physical Plan ==
> *(2) Project [b#181, a#184L]
> +- *(2) Filter (isnotnull(sum(cast(a#180 as bigint))#197L) && (sum(cast(a#180 
> as bigint))#197L > 3))
>    +- *(2) HashAggregate(keys=[b#181], functions=[sum(cast(a#180 as bigint))])
>       +- Exchange hashpartitioning(b#181, 5)
>          +- *(1) HashAggregate(keys=[b#181], 
> functions=[partial_sum(cast(a#180 as bigint))])
>             +- *(1) Project [_1#177 AS a#180, _2#178 AS b#181]
>                +- LocalTableScan [_1#177, _2#178]
> ```{code}
> Spend A lot of time I can't find witch analyzer make this different,
> When column type is double, it failed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to