Furcy Pin commented on SPARK-17592:

SQL server: query fails (like postgres)

Oracle SQL: 
I did not manage to have a working example :

BTW, my mistake: in the description I said Hive and mySQL where doing 
floor(string.toDouble), I was wrong, it is doing string.toDouble.toInt.
These are different for negative numbers!

I would say that the expected behavior should be either fail or 
.toDouble.toInt, as I don't know many languages (if any) where floats are 
rounded when cast into integers.

Personally, I don't think crashing a nightly ETL job in production because 
someone somewhere added "0.4" in your data would be a good idea, which is why I 
think .toDouble.toInt is best. This is obviously Spark's opinion too, since in 
Spark, CAST("a" as INT) returns NULL rather than failing.

Another option would be to say that CAST("0.4" as INT) should return NULL as 
well. That would not break jobs, but that would break compliance with Hive, and 
still be different from every other SQL engine.

And Spark-SQL being compliant with Hive the best way to help people migrate 
from Hive to Spark, is it not ?

> SQL: CAST string as INT inconsistent with Hive
> ----------------------------------------------
>                 Key: SPARK-17592
>                 URL: https://issues.apache.org/jira/browse/SPARK-17592
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Furcy Pin
> Hello,
> there seem to be an inconsistency between Spark and Hive when casting a 
> string into an Int. 
> With Hive:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 0
> select cast("0.6" as INT) ;
> > 0
> {code}
> With Spark-SQL:
> {code}
> select cast("0.4" as INT) ;
> > 0
> select cast("0.5" as INT) ;
> > 1
> select cast("0.6" as INT) ;
> > 1
> {code}
> Hive seems to perform a floor(string.toDouble), while Spark seems to perform 
> a round(string.toDouble)
> I'm not sure there is any ISO standard for this, mysql has the same behavior 
> than Hive, while postgresql performs a string.toInt and throws an 
> NumberFormatException
> Personnally I think Hive is right, hence my posting this here.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to