[ https://issues.apache.org/jira/browse/SPARK-22049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170205#comment-16170205 ]
Sean Owen commented on SPARK-22049: ----------------------------------- Questions should normally go to the mailing list, but I'll answer here. I recall looking at this a while ago and finding how it's explained in Hive, where it comes from, confusingly informal about mixing up the idea of UNIX timestamp, time zone-less timestamp in HQL, and a time with a time zone. I think the functionality is correct, if you compare with the examples in https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF . Roughly, if you give a timezone that's GMT+n, then from_utc_timestamp adds n hours to the time and to_utc_timestamp subtracts n hours. The idea behind from_utc_timestamp is: - Input is 1500000000000, a UNIX timestamp, which unambiguously represents a point in time and so has no notion of a timezone - It's construed it as a time w.r.t. UTC, to get "2017-07-14 02:40", a timestamp without a timezone per se, but conceptually in UTC - Figure what this same moment in time would be rendered as in GMT+1: "2017-07-14 03:40", another timestamp, conceptually in GMT+1 That gives the answer above. Then unix_timestamp construes "2017-07-14 03:40" as a UTC time and you get back a time an hour later. I'm open to better ideas about the text, though I think it's actually correct. > Confusing behavior of from_utc_timestamp and to_utc_timestamp > ------------------------------------------------------------- > > Key: SPARK-22049 > URL: https://issues.apache.org/jira/browse/SPARK-22049 > Project: Spark > Issue Type: Question > Components: SQL > Affects Versions: 2.1.0 > Reporter: Felipe Olmos > Priority: Minor > Labels: spark-sql > > Hello everyone, > I am confused about the behavior of the functions {{from_utc_timestamp}} and > {{to_utc_timestamp}}. As an example, take the following code to a spark shell > {code:java} > import java.sql.Timestamp > import org.apache.spark.sql.Row > import org.apache.spark.sql.types._ > // 2017-07-14 02:40 UTC > val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil) > val df = spark.createDataFrame(rdd, StructType(StructField("date", > TimestampType) :: Nil)) > df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as > "from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1, > false) > // Date format printing is dependent on the timezone of the machine. > // The following is in UTC > // +---------------------+---------------------+---------------------+ > > // |date |from_utc |to_utc | > // +---------------------+---------------------+---------------------+ > // |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0| > // +---------------------+---------------------+---------------------+ > df.select(unix_timestamp(df("date")) as "date", > unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc", > unix_timestamp(to_utc_timestamp(df("date"), "GMT+01:00")) as > "to_utc").show(1, false) > // +----------+----------+----------+ > // |date |from_utc |to_utc | > // +----------+----------+----------+ > // |1500000000|1500003600|1499996400| > // +----------+----------+----------+ > {code} > So, if interpret correctly, {{from_utc_timestamp}} took {{02:40 UTC}} > interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to > {{03:40 UTC}}. However the description of {{from_utc_timestamp}} says > bq. Given a timestamp, which corresponds to a certain time of day in UTC, > returns another timestamp that corresponds to the same time of day in the > given timezone. > I would have then expected that the function take {{02:40 UTC}} and return > {{02:40 GMT+1 = 01:40 UTC}}. In fact, I think the descriptions of > {{from_utc_timestamp}} and {{to_utc_timestamp}} are inverted. > I am interpreting this right? > Thanks in advance > Felipe -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org