[jira] [Created] (SPARK-18489) Implicit type conversion during comparision between Integer type column and String type column

Bipul Kumar (JIRA) Thu, 17 Nov 2016 03:30:08 -0800

Bipul Kumar created SPARK-18489:
-----------------------------------

             Summary: Implicit type conversion during comparision between 
Integer type column and String type column
                 Key: SPARK-18489
                 URL: https://issues.apache.org/jira/browse/SPARK-18489
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Bipul Kumar



Suppose I have a dataframe with schema:
root
 |-- _c0: integer (nullable = true)
 |-- _c1: double (nullable = true)
 |-- _c2: string (nullable = true)


and data:
+---+---+----+
|_c0|_c1| _c2|
+---+---+----+
|  1|1.0|   1|
|  2|1.0|   s|
|  3|3.1|null|
+---+---+----+
if the following operations are carried out:
df.where("_c1==_c2").show
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
|  1|1.0|  1|
+---+---+---+

df.where("_c1<>_c2").show   or   df.where("_c1!=_c2").show 
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
+---+---+---+
So the related operation results are ambiguous
Here the stringified numeric values are being Implicitly casted where the 
others are just ignored instead of throwing an exception
In my view these things can lead to incorrect results if dataset is not 
properly observed. 

Also SQL-99 standard discourages implicit casting to avoid such things.
https://users.dcc.uchile.cl/~cgutierr/cursos/BD/standards.pdf

The same implicit casting is also there for UDFs and aggregation functions.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-18489) Implicit type conversion during comparision between Integer type column and String type column

Reply via email to