Marco Gaido commented on SPARK-23498:

I think we are seeing many of these issues with implicit casting. There has 
been a lot of work about how implicit casting should work, but it seems that we 
are not able to find a way to make it working without issues.

Honestly, since what I have seen so far, I'd consider changing the paradigm and 
do what Postgres does, ie. if the datatypes are different, just throw an 
exception and force the user to eventually cast to the proper datatypes. This 
might be a bit tedious in some cases, but enforce stability and avoids having 
unexpected results.

[~hyukjin.kwon] [~smilegator] [~cloud_fan]  what do you think?

> Accuracy problem in comparison with string and integer
> ------------------------------------------------------
>                 Key: SPARK-23498
>                 URL: https://issues.apache.org/jira/browse/SPARK-23498
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0, 2.2.1, 2.3.0
>            Reporter: Kevin Zhang
>            Priority: Major
> While comparing a string column with integer value, spark sql will 
> automatically cast the string operant to int, the following sql will return 
> true in hive but false in spark
> {code:java}
> select '1000.1'>1000
> {code}
>  from the physical plan we can see the string operant was cast to int which 
> caused the accuracy loss
> {code:java}
> *Project [false AS (CAST(1000.1 AS INT) > 1000)#4]
> +- Scan OneRowRelation[]
> {code}
> To solve it, using a wider common type like double to cast both sides of 
> operant of a binary operator may be safe.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to