[ https://issues.apache.org/jira/browse/SPARK-31334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-31334: ------------------------------------ Assignee: Apache Spark > Use agg column in Having clause behave different with column type > ------------------------------------------------------------------ > > Key: SPARK-31334 > URL: https://issues.apache.org/jira/browse/SPARK-31334 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.0, 3.0.0 > Reporter: angerszhu > Assignee: Apache Spark > Priority: Major > > {code:java} > ``` > test("xxxxxxxx") { > Seq( > (1, 3), > (2, 3), > (3, 6), > (4, 7), > (5, 9), > (6, 9) > ).toDF("a", "b").createOrReplaceTempView("testData") > val x = sql( > """ > | SELECT b, sum(a) as a > | FROM testData > | GROUP BY b > | HAVING sum(a) > 3 > """.stripMargin) > x.explain() > x.show() > } > [info] - xxxxxxxx *** FAILED *** (508 milliseconds) > [info] org.apache.spark.sql.AnalysisException: Resolved attribute(s) a#184 > missing from a#180,b#181 in operator !Aggregate [b#181], [b#181, > sum(cast(a#180 as double)) AS a#184, sum(a#184) AS sum(a#184)#188]. > Attribute(s) with the same name appear in the operation: a. Please check if > the right attribute(s) are used.;; > [info] Project [b#181, a#184] > [info] +- Filter (sum(a#184)#188 > cast(3 as double)) > [info] +- !Aggregate [b#181], [b#181, sum(cast(a#180 as double)) AS a#184, > sum(a#184) AS sum(a#184)#188] > [info] +- SubqueryAlias `testdata` > [info] +- Project [_1#177 AS a#180, _2#178 AS b#181] > [info] +- LocalRelation [_1#177, _2#178] > ``` > ``` > test("xxxxxxxx") { > Seq( > ("1", "3"), > ("2", "3"), > ("3", "6"), > ("4", "7"), > ("5", "9"), > ("6", "9") > ).toDF("a", "b").createOrReplaceTempView("testData") > val x = sql( > """ > | SELECT b, sum(a) as a > | FROM testData > | GROUP BY b > | HAVING sum(a) > 3 > """.stripMargin) > x.explain() > x.show() > } > == Physical Plan == > *(2) Project [b#181, a#184L] > +- *(2) Filter (isnotnull(sum(cast(a#180 as bigint))#197L) && (sum(cast(a#180 > as bigint))#197L > 3)) > +- *(2) HashAggregate(keys=[b#181], functions=[sum(cast(a#180 as bigint))]) > +- Exchange hashpartitioning(b#181, 5) > +- *(1) HashAggregate(keys=[b#181], > functions=[partial_sum(cast(a#180 as bigint))]) > +- *(1) Project [_1#177 AS a#180, _2#178 AS b#181] > +- LocalTableScan [_1#177, _2#178] > ```{code} > Spend A lot of time I can't find witch analyzer make this different, > When column type is double, it failed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org