[
https://issues.apache.org/jira/browse/SPARK-20270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-20270:
------------------------------------
Assignee: Apache Spark (was: DB Tsai)
> na.fill will change the values in long or integer when the default value is
> in double
> -------------------------------------------------------------------------------------
>
> Key: SPARK-20270
> URL: https://issues.apache.org/jira/browse/SPARK-20270
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
> Reporter: DB Tsai
> Assignee: Apache Spark
> Priority: Critical
>
> This bug was partially addressed in SPARK-18555, but the root cause isn't
> completely solved. This bug is pretty critical since it changes the member id
> in Long in our application if the member id can not be represented by Double
> losslessly when the member id is very big.
> Here is an example how this happens, with
> {code}
> Seq[(java.lang.Long, java.lang.Double)]((null, 3.14),
> (9123146099426677101L, null),
> (9123146560113991650L, 1.6), (null, null)).toDF("a",
> "b").na.fill(0.2),
> {code}
> the logical plan will be
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [cast(coalesce(cast(a#232L as double), cast(0.2 as double)) as
> bigint) AS a#240L, cast(coalesce(nanvl(b#233, cast(null as double)), 0.2) as
> double) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
> +- LocalRelation [_1#229L, _2#230]
> {code}.
> Note that even the value is not null, Spark will cast the Long into Double
> first. Then if it's not null, Spark will cast it back to Long which results
> in losing precision.
> The behavior should be that the original value should not be changed if it's
> not null, but Spark will change the value which is wrong.
> With the PR, the logical plan will be
> {code}
> == Analyzed Logical Plan ==
> a: bigint, b: double
> Project [coalesce(a#232L, cast(0.2 as bigint)) AS a#240L,
> coalesce(nanvl(b#233, cast(null as double)), cast(0.2 as double)) AS b#241]
> +- Project [_1#229L AS a#232L, _2#230 AS b#233]
> +- LocalRelation [_1#229L, _2#230]
> {code}
> which behaves correctly without changing the original Long values.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]