[
https://issues.apache.org/jira/browse/SPARK-11415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14981968#comment-14981968
]
Russell Alexander Spitzer edited comment on SPARK-11415 at 10/30/15 6:20 AM:
-----------------------------------------------------------------------------
Some tests are now, broken investigating
The fix in SPARK-6785 seems to be off to me
In it 1 second before epoch and 1 second after epoch are 1 Day apart. This
should not be true. They should both be equivelently far (in days) from epoch 0
Actually i'm not sure about this now...
was (Author: rspitzer):
Some tests are now, broken investigating
The fix in SPARK-6785 seems to be off to me
In it 1 second before epoch and 1 second after epoch are 1 Day apart. This
should not be true. They should both be equivelently far (in days) from epoch 0
> Catalyst DateType Shifts Input Data by Local Timezone
> -----------------------------------------------------
>
> Key: SPARK-11415
> URL: https://issues.apache.org/jira/browse/SPARK-11415
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.0, 1.5.1
> Reporter: Russell Alexander Spitzer
>
> I've been running type tests for the Spark Cassandra Connector and couldn't
> get a consistent result for java.sql.Date. I investigated and noticed the
> following code is used to create Catalyst.DateTypes
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L139-L144
> {code}
> /**
> * Returns the number of days since epoch from from java.sql.Date.
> */
> def fromJavaDate(date: Date): SQLDate = {
> millisToDays(date.getTime)
> }
> {code}
> But millisToDays does not abide by this contract, shifting the underlying
> timestamp to the local timezone before calculating the days from epoch. This
> causes the invocation to move the actual date around.
> {code}
> // we should use the exact day as Int, for example, (year, month, day) ->
> day
> def millisToDays(millisUtc: Long): SQLDate = {
> // SPARK-6785: use Math.floor so negative number of days (dates before
> 1970)
> // will correctly work as input for function toJavaDate(Int)
> val millisLocal = millisUtc +
> threadLocalLocalTimeZone.get().getOffset(millisUtc)
> Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
> }
> {code}
> The inverse function also incorrectly shifts the timezone
> {code}
> // reverse of millisToDays
> def daysToMillis(days: SQLDate): Long = {
> val millisUtc = days.toLong * MILLIS_PER_DAY
> millisUtc - threadLocalLocalTimeZone.get().getOffset(millisUtc)
> }
> {code}
> https://github.com/apache/spark/blob/bb3b3627ac3fcd18be7fb07b6d0ba5eae0342fc3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L81-L93
> This will cause 1-off errors and could cause significant shifts in data if
> the underlying data is worked on in different timezones than UTC.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]