[
https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14982024#comment-14982024
]
Russell Alexander Spitzer edited comment on SPARK-11415 at 10/30/15 7:14 AM:
-----------------------------------------------------------------------------
I added another commit to fix up the tests. The test code will now run
identically no matter what time zone it happens to be run in (even PDT).
was (Author: rspitzer):
I added another commit to fix up the tests. The test code will now run
identically no matter what time zone it happens to be run in (even UTC).
> Catalyst DateType Shifts Input Data by Local Timezone
> -----------------------------------------------------
>
> Key: SPARK-11415
> URL: https://issues.apache.org/jira/browse/SPARK-11415
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0, 1.5.1
> Reporter: Russell Alexander Spitzer
>
> I've been running type tests for the Spark Cassandra Connector and couldn't
> get a consistent result for java.sql.Date. I investigated and noticed the
> following code is used to create Catalyst.DateTypes
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
> {code}
> /**
> * Returns the number of days since epoch from from java.sql.Date.
> */
> def fromJavaDate(date: Date): SQLDate = {
> millisToDays(date.getTime)
> }
> {code}
> But millisToDays does not abide by this contract, shifting the underlying
> timestamp to the local timezone before calculating the days from epoch. This
> causes the invocation to move the actual date around.
> {code}
> // we should use the exact day as Int, for example, (year, month, day) ->
> day
> def millisToDays(millisUtc: Long): SQLDate = {
> // SPARK-6785: use Math.floor so negative number of days (dates before
> 1970)
> // will correctly work as input for function toJavaDate(Int)
> val millisLocal = millisUtc +
> threadLocalLocalTimeZone.get().getOffset(millisUtc)
> Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
> }
> {code}
> The inverse function also incorrectly shifts the timezone
> {code}
> // reverse of millisToDays
> def daysToMillis(days: SQLDate): Long = {
> val millisUtc = days.toLong * MILLIS_PER_DAY
> millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
> }
> {code}
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93
> This will cause 1-off errors and could cause significant shifts in data if
> the underlying data is worked on in different timezones than UTC.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]