[
https://issues.apache.org/jira/browse/SPARK-32136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-32136.
----------------------------------
Fix Version/s: 3.1.0
3.0.1
Resolution: Fixed
Issue resolved by pull request 28962
[https://github.com/apache/spark/pull/28962]
> Spark producing incorrect groupBy results when key is a struct with nullable
> properties
> ---------------------------------------------------------------------------------------
>
> Key: SPARK-32136
> URL: https://issues.apache.org/jira/browse/SPARK-32136
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Jason Moore
> Assignee: L. C. Hsieh
> Priority: Blocker
> Labels: correctness
> Fix For: 3.0.1, 3.1.0
>
>
> I'm in the process of migrating from Spark 2.4.x to Spark 3.0.0 and I'm
> noticing a behaviour change in a particular aggregation we're doing, and I
> think I've tracked it down to how Spark is now treating nullable properties
> within the column being grouped by.
>
> Here's a simple test I've been able to set up to repro it:
>
> {code:scala}
> case class B(c: Option[Double])
> case class A(b: Option[B])
> val df = Seq(
> A(None),
> A(Some(B(None))),
> A(Some(B(Some(1.0))))
> ).toDF
> val res = df.groupBy("b").agg(count("*"))
> {code}
> Spark 2.4.6 has the expected result:
> {noformat}
> > res.show
> +-----+--------+
> | b|count(1)|
> +-----+--------+
> | []| 1|
> | null| 1|
> |[1.0]| 1|
> +-----+--------+
> > res.collect.foreach(println)
> [[null],1]
> [null,1]
> [[1.0],1]
> {noformat}
> But Spark 3.0.0 has an unexpected result:
> {noformat}
> > res.show
> +-----+--------+
> | b|count(1)|
> +-----+--------+
> | []| 2|
> |[1.0]| 1|
> +-----+--------+
> > res.collect.foreach(println)
> [[null],2]
> [[1.0],1]
> {noformat}
> Notice how it has keyed one of the values in be as `[null]`; that is, an
> instance of B with a null value for the `c` property instead of a null for
> the overall value itself.
> Is this an intended change?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]