[
https://issues.apache.org/jira/browse/SPARK-26706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
DB Tsai reassigned SPARK-26706:
-------------------------------
Assignee: Anton Okolnychyi
> Fix Cast$mayTruncate for bytes
> ------------------------------
>
> Key: SPARK-26706
> URL: https://issues.apache.org/jira/browse/SPARK-26706
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.3, 2.4.1, 3.0.0
> Reporter: Anton Okolnychyi
> Assignee: Anton Okolnychyi
> Priority: Major
>
> The logic in {{Cast$mayTruncate}} is broken for bytes.
> Right now, {{mayTruncate(ByteType, LongType)}} returns {{false}} while
> {{mayTruncate(ShortType, LongType)}} returns {{true}}. Consequently,
> {{spark.range(1, 3).as[Byte]}} and {{spark.range(1, 3).as[Short]}} will
> behave differently.
> Potentially, this bug can lead to silently corrupting someone's data.
> {code}
> // executes silently even though Long is converted into Byte
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
> .map(b => b - 1)
> .show()
> +-----+
> |value|
> +-----+
> | -12|
> | -11|
> | -10|
> | -9|
> | -8|
> | -7|
> | -6|
> | -5|
> | -4|
> | -3|
> +-----+
> // throws an AnalysisException: Cannot up cast `id` from bigint to smallint
> as it may truncate
> spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
> .map(s => s - 1)
> .show()
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]