Hi, Anton.

It’s the same result with Hive, isn’t it?

hive> select 9.223372036854786E20, ceil(9.223372036854786E20);
OK
_c0      _c1
9.223372036854786E20         9223372036854775807
Time taken: 2.041 seconds, Fetched: 1 row(s)

Bests,
Dongjoon.

From: Anton Okolnychyi <anton.okolnyc...@gmail.com>
Date: Friday, May 19, 2017 at 7:26 AM
To: "dev@spark.apache.org" <dev@spark.apache.org>
Subject: [Spark SQL] ceil and floor functions on doubles

Hi all,

I am wondering why the results of ceil and floor functions on doubles are 
internally casted to longs. This causes loss of precision since doubles can 
hold bigger numbers.

Consider the following example:

// 9.223372036854786E20 is greater than Long.MaxValue
val df = sc.parallelize(Array(("col", 9.223372036854786E20))).toDF()
df.createOrReplaceTempView("tbl")
spark.sql("select _2 AS original_value, ceil(_2) as ceil_result from 
tbl").show()

+---------------------------------+---------------------------------+
|        original_value           |         ceil_result               |
+---------------------------------+---------------------------------+
| 9.223372036854786E20 | 9223372036854775807 |
+---------------------------------+---------------------------------+

So, the original double value is rounded to 9223372036854775807, which is 
Long.MaxValue.
I think that it would be better to return 9.223372036854786E20 as it was (and 
as it is actually returned by math.ceil before the cast to long). If it is a 
problem, then I can fix this.

Best regards,
Anton

Reply via email to