[
https://issues.apache.org/jira/browse/SPARK-18489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673469#comment-15673469
]
Bipul Kumar edited comment on SPARK-18489 at 11/17/16 11:35 AM:
----------------------------------------------------------------
[~prashant_] [[email protected]] please review this.
was (Author: dasbipulkumar):
[~prashant_] please review this.
> Implicit type conversion during comparision between Integer type column and
> String type column
> ----------------------------------------------------------------------------------------------
>
> Key: SPARK-18489
> URL: https://issues.apache.org/jira/browse/SPARK-18489
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: Bipul Kumar
>
> Suppose I have a dataframe with schema:
> root
> |-- _c0: integer (nullable = true)
> |-- _c1: double (nullable = true)
> |-- _c2: string (nullable = true)
> and data:
> +---+---+----+
> |_c0|_c1| _c2|
> +---+---+----+
> | 1|1.0| 1|
> | 2|1.0| s|
> | 3|3.1|null|
> +---+---+----+
> if the following operations are carried out:
> df.where("_c1==_c2").show
> +---+---+---+
> |_c0|_c1|_c2|
> +---+---+---+
> | 1|1.0| 1|
> +---+---+---+
> df.where("_c1<>_c2").show or df.where("_c1!=_c2").show
> +---+---+---+
> |_c0|_c1|_c2|
> +---+---+---+
> +---+---+---+
> So the related operation results are ambiguous
> Here the stringified numeric values are being Implicitly casted where the
> others are just ignored instead of throwing an exception
> In my view these things can lead to incorrect results if dataset is not
> properly observed.
> Also SQL-99 standard discourages implicit casting to avoid such things.
> https://users.dcc.uchile.cl/~cgutierr/cursos/BD/standards.pdf
> The same implicit casting is also there for UDFs and aggregation functions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]