Bipul Kumar created SPARK-18489:
-----------------------------------
Summary: Implicit type conversion during comparision between
Integer type column and String type column
Key: SPARK-18489
URL: https://issues.apache.org/jira/browse/SPARK-18489
Project: Spark
Issue Type: Bug
Components: SQL
Reporter: Bipul Kumar
Suppose I have a dataframe with schema:
root
|-- _c0: integer (nullable = true)
|-- _c1: double (nullable = true)
|-- _c2: string (nullable = true)
and data:
+---+---+----+
|_c0|_c1| _c2|
+---+---+----+
| 1|1.0| 1|
| 2|1.0| s|
| 3|3.1|null|
+---+---+----+
if the following operations are carried out:
df.where("_c1==_c2").show
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
| 1|1.0| 1|
+---+---+---+
df.where("_c1<>_c2").show or df.where("_c1!=_c2").show
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
+---+---+---+
So the related operation results are ambiguous
Here the stringified numeric values are being Implicitly casted where the
others are just ignored instead of throwing an exception
In my view these things can lead to incorrect results if dataset is not
properly observed.
Also SQL-99 standard discourages implicit casting to avoid such things.
https://users.dcc.uchile.cl/~cgutierr/cursos/BD/standards.pdf
The same implicit casting is also there for UDFs and aggregation functions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]