[
https://issues.apache.org/jira/browse/SPARK-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-11894:
------------------------------------
Assignee: (was: Apache Spark)
> Incorrect results are returned when using null
> ----------------------------------------------
>
> Key: SPARK-11894
> URL: https://issues.apache.org/jira/browse/SPARK-11894
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 1.6.0
> Reporter: Xiao Li
>
> In DataSet APIs, the following two datasets are the same.
> Seq((new java.lang.Integer(0), "1"), (new java.lang.Integer(22),
> "2")).toDS()
> Seq((null.asInstanceOf[java.lang.Integer],, "1"), (new
> java.lang.Integer(22), "2")).toDS()
> Note: java.lang.Integer is Nullable.
> It could generate an incorrect result. For example,
> val ds1 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new
> java.lang.Integer(22), "2")).toDS()
> val ds2 = Seq((null.asInstanceOf[java.lang.Integer], "1"), (new
> java.lang.Integer(22), "2")).toDS()//toDF("key", "value").as('df2)
> val res1 = ds1.joinWith(ds2, lit(true)).collect()
> The expected result should be
> ((null,1),(null,1))
> ((22,2),(null,1))
> ((null,1),(22,2))
> ((22,2),(22,2))
> The actual result is
> ((0,1),(0,1))
> ((22,2),(0,1))
> ((0,1),(22,2))
> ((22,2),(22,2))
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]