[ https://issues.apache.org/jira/browse/SPARK-30696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17351464#comment-17351464 ]
dc-heros edited comment on SPARK-30696 at 5/26/21, 3:54 AM: ------------------------------------------------------------ fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes days For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 1960-09-25 but previous version have the cutoff at 8AM Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 1960-09-25 08:30:00 and 1960-09-25 09:30:00 and the fromUTCtime just pick 1 of them, so there just wrong on the cutoff time in those function Could you edit the description [~maxgekk] was (Author: dc-heros): fromUTCtime and toUTCtime produced wrong result on Daylight Saving Time changes days For example, in LA in 1960, timezone switch from UTC-7h to UTC-8h at 2AM in 1960-09-25 but previous version have the cutoff at 8AM Because of this, for example 1960-09-25 1:30:00 in LA can be equal to both 1960-09-25 08:30:00 and 1960-09-25 09:30:00, so there just wrong on the cutoff time from those function Could you edit the description [~maxgekk] > Wrong result of the combination of from_utc_timestamp and to_utc_timestamp > -------------------------------------------------------------------------- > > Key: SPARK-30696 > URL: https://issues.apache.org/jira/browse/SPARK-30696 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.4.4, 3.0.0 > Reporter: Max Gekk > Priority: Major > > Applying to_utc_timestamp() to results of from_utc_timestamp() should return > the original timestamp in the same time zone. In the range of 100 years, the > combination of functions returns wrong results 280 times out of 1753200: > {code:java} > scala> val SECS_PER_YEAR = (36525L * 24 * 60 * 60)/100 > SECS_PER_YEAR: Long = 31557600 > scala> val SECS_PER_MINUTE = 60L > SECS_PER_MINUTE: Long = 60 > scala> val tz = "America/Los_Angeles" > tz: String = America/Los_Angeles > scala> val df = spark.range(-50 * SECS_PER_YEAR, 50 * SECS_PER_YEAR, 30 * > SECS_PER_MINUTE) > df: org.apache.spark.sql.Dataset[Long] = [id: bigint] > scala> val diff = > df.select((to_utc_timestamp(from_utc_timestamp($"id".cast("timestamp"), tz), > tz).cast("long") - $"id").as("diff")).filter($"diff" !== 0) > warning: there was one deprecation warning; re-run with -deprecation for > details > diff: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [diff: bigint] > scala> diff.count > res14: Long = 280 > scala> df.count > res15: Long = 1753200 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org