[ 
https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982024#comment-14982024
 ] 

Russell Alexander Spitzer commented on SPARK-11415:
---------------------------------------------------

I added another commit to fix up the tests. The test code will now run 
identically no matter what time zone it happens to be run in (even UTC).

> Catalyst DateType Shifts Input Data by Local Timezone
> -----------------------------------------------------
>
>                 Key: SPARK-11415
>                 URL: https://issues.apache.org/jira/browse/SPARK-11415
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Russell Alexander Spitzer
>
> I've been running type tests for the Spark Cassandra Connector and couldn't 
> get a consistent result for java.sql.Date. I investigated and noticed the 
> following code is used to create Catalyst.DateTypes
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
> {code}
>  /**
>    * Returns the number of days since epoch from from java.sql.Date.
>    */
>   def fromJavaDate(date: Date): SQLDate = {
>     millisToDays(date.getTime)
>   }
> {code}
> But millisToDays does not abide by this contract, shifting the underlying 
> timestamp to the local timezone before calculating the days from epoch. This 
> causes the invocation to move the actual date around.
> {code}
>   // we should use the exact day as Int, for example, (year, month, day) -> 
> day
>   def millisToDays(millisUtc: Long): SQLDate = {
>     // SPARK-6785: use Math.floor so negative number of days (dates before 
> 1970)
>     // will correctly work as input for function toJavaDate(Int)
>     val millisLocal = millisUtc + 
> threadLocalLocalTimeZone.get().getOffset(millisUtc)
>     Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
>   }
> {code}
> The inverse function also incorrectly shifts the timezone
> {code}
>   // reverse of millisToDays
>   def daysToMillis(days: SQLDate): Long = {
>     val millisUtc = days.toLong * MILLIS_PER_DAY
>     millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
>   }
> {code}
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93
> This will cause 1-off errors and could cause significant shifts in data if 
> the underlying data is worked on in different timezones than UTC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to