[ 
https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14995934#comment-14995934
 ] 

Russell Alexander Spitzer commented on SPARK-11415:
---------------------------------------------------

I re-read a bunch of the Java.sql.date docs and now I think that the current 
code is actually ok based on the JavaDocs. I'll just have to make sure that we 
don't use the java.sql.Date(Long timestamp) constructor since that doesn't do 
the timezone wrapping which the valueOf method does. 

I find this behavior a bit odd but it seems this is a long known oddity of 
java.util.Date vs java.sql.Date

> Catalyst DateType Shifts Input Data by Local Timezone
> -----------------------------------------------------
>
>                 Key: SPARK-11415
>                 URL: https://issues.apache.org/jira/browse/SPARK-11415
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>            Reporter: Russell Alexander Spitzer
>
> I've been running type tests for the Spark Cassandra Connector and couldn't 
> get a consistent result for java.sql.Date. I investigated and noticed the 
> following code is used to create Catalyst.DateTypes
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
> {code}
>  /**
>    * Returns the number of days since epoch from from java.sql.Date.
>    */
>   def fromJavaDate(date: Date): SQLDate = {
>     millisToDays(date.getTime)
>   }
> {code}
> But millisToDays does not abide by this contract, shifting the underlying 
> timestamp to the local timezone before calculating the days from epoch. This 
> causes the invocation to move the actual date around.
> {code}
>   // we should use the exact day as Int, for example, (year, month, day) -> 
> day
>   def millisToDays(millisUtc: Long): SQLDate = {
>     // SPARK-6785: use Math.floor so negative number of days (dates before 
> 1970)
>     // will correctly work as input for function toJavaDate(Int)
>     val millisLocal = millisUtc + 
> threadLocalLocalTimeZone.get().getOffset(millisUtc)
>     Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
>   }
> {code}
> The inverse function also incorrectly shifts the timezone
> {code}
>   // reverse of millisToDays
>   def daysToMillis(days: SQLDate): Long = {
>     val millisUtc = days.toLong * MILLIS_PER_DAY
>     millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
>   }
> {code}
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93
> This will cause 1-off errors and could cause significant shifts in data if 
> the underlying data is worked on in different timezones than UTC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to