[ https://issues.apache.org/jira/browse/SPARK-41391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruifeng Zheng updated SPARK-41391: ---------------------------------- Description: scala> val df = spark.range(1, 10).withColumn("value", lit(1)) df: org.apache.spark.sql.DataFrame = [id: bigint, value: int] scala> df.createOrReplaceTempView("table") scala> df.groupBy("id").agg(count_distinct($"value")) res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint] scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ") res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): bigint] scala> df.groupBy("id").agg(count_distinct($"*")) res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): bigint] scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ") res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, value): bigint] > The output column name of `groupBy.agg(count_distinct)` is incorrect > -------------------------------------------------------------------- > > Key: SPARK-41391 > URL: https://issues.apache.org/jira/browse/SPARK-41391 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.2.0, 3.3.0, 3.4.0 > Reporter: Ruifeng Zheng > Priority: Major > > scala> val df = spark.range(1, 10).withColumn("value", lit(1)) > df: org.apache.spark.sql.DataFrame = [id: bigint, value: int] > scala> df.createOrReplaceTempView("table") > scala> df.groupBy("id").agg(count_distinct($"value")) > res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint] > scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ") > res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): > bigint] > scala> df.groupBy("id").agg(count_distinct($"*")) > res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): > bigint] > scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ") > res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, > value): bigint] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org