Repository: spark
Updated Branches:
  refs/heads/master 8bd27025b -> 504c9cfd2


[SPARK-24123][SQL] Fix precision issues in monthsBetween with more than 8 digits

## What changes were proposed in this pull request?

SPARK-23902 introduced the ability to retrieve more than 8 digits in 
`monthsBetween`. Unfortunately, current implementation can cause precision loss 
in such a case. This was causing also a flaky UT.

This PR mirrors Hive's implementation in order to avoid precision loss also 
when more than 8 digits are returned.

## How was this patch tested?

running 10000000 times the flaky UT

Author: Marco Gaido <marcogaid...@gmail.com>

Closes #21196 from mgaido91/SPARK-24123.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/504c9cfd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/504c9cfd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/504c9cfd

Branch: refs/heads/master
Commit: 504c9cfd21ef45a13d9428fef3b197dcbf6786cd
Parents: 8bd2702
Author: Marco Gaido <marcogaid...@gmail.com>
Authored: Wed May 2 13:49:15 2018 -0700
Committer: gatorsmile <gatorsm...@gmail.com>
Committed: Wed May 2 13:49:15 2018 -0700

----------------------------------------------------------------------
 .../spark/sql/catalyst/util/DateTimeUtils.scala    | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/504c9cfd/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
----------------------------------------------------------------------
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index 4b00a61..d2fe15c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -888,14 +888,19 @@ object DateTimeUtils {
     val months1 = year1 * 12 + monthInYear1
     val months2 = year2 * 12 + monthInYear2
 
+    val monthDiff = (months1 - months2).toDouble
+
     if (dayInMonth1 == dayInMonth2 || ((daysToMonthEnd1 == 0) && 
(daysToMonthEnd2 == 0))) {
-      return (months1 - months2).toDouble
+      return monthDiff
     }
-    // milliseconds is enough for 8 digits precision on the right side
-    val timeInDay1 = millis1 - daysToMillis(date1, timeZone)
-    val timeInDay2 = millis2 - daysToMillis(date2, timeZone)
-    val timesBetween = (timeInDay1 - timeInDay2).toDouble / MILLIS_PER_DAY
-    val diff = (months1 - months2).toDouble + (dayInMonth1 - dayInMonth2 + 
timesBetween) / 31.0
+    // using milliseconds can cause precision loss with more than 8 digits
+    // we follow Hive's implementation which uses seconds
+    val secondsInDay1 = (millis1 - daysToMillis(date1, timeZone)) / 1000L
+    val secondsInDay2 = (millis2 - daysToMillis(date2, timeZone)) / 1000L
+    val secondsDiff = (dayInMonth1 - dayInMonth2) * SECONDS_PER_DAY + 
secondsInDay1 - secondsInDay2
+    // 2678400D is the number of seconds in 31 days
+    // every month is considered to be 31 days long in this function
+    val diff = monthDiff + secondsDiff / 2678400D
     if (roundOff) {
       // rounding to 8 digits
       math.round(diff * 1e8) / 1e8


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to