[ 
https://issues.apache.org/jira/browse/SPARK-22049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170205#comment-16170205
 ] 

Sean Owen commented on SPARK-22049:
-----------------------------------

Questions should normally go to the mailing list, but I'll answer here.

I recall looking at this a while ago and finding how it's explained in Hive, 
where it comes from, confusingly informal about mixing up the idea of UNIX 
timestamp, time zone-less timestamp in HQL, and a time with a time zone. I 
think the functionality is correct, if you compare with the examples in 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF . Roughly, 
if you give a timezone that's GMT+n, then from_utc_timestamp adds n hours to 
the time and to_utc_timestamp subtracts n hours.

The idea behind from_utc_timestamp is:

- Input is 1500000000000, a UNIX timestamp, which unambiguously represents a 
point in time and so has no notion of a timezone
- It's construed it as a time w.r.t. UTC, to get "2017-07-14 02:40", a 
timestamp without a timezone per se, but conceptually in UTC
- Figure what this same moment in time would be rendered as in GMT+1: 
"2017-07-14 03:40", another timestamp, conceptually in GMT+1

That gives the answer above. Then unix_timestamp construes "2017-07-14 03:40" 
as a UTC time and you get back a time an hour later.

I'm open to better ideas about the text, though I think it's actually correct.

> Confusing behavior of from_utc_timestamp and to_utc_timestamp
> -------------------------------------------------------------
>
>                 Key: SPARK-22049
>                 URL: https://issues.apache.org/jira/browse/SPARK-22049
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Felipe Olmos
>            Priority: Minor
>              Labels: spark-sql
>
> Hello everyone,
> I am confused about the behavior of the functions {{from_utc_timestamp}} and 
> {{to_utc_timestamp}}. As an example, take the following code to a spark shell
> {code:java}
> import java.sql.Timestamp 
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> // 2017-07-14 02:40 UTC
> val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
> val df = spark.createDataFrame(rdd, StructType(StructField("date", 
> TimestampType) :: Nil))
> df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as 
> "from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1, 
> false)
> // Date format printing is dependent on the timezone of the machine. 
> // The following is in UTC
> // +---------------------+---------------------+---------------------+        
>      
> // |date                 |from_utc             |to_utc               |
> // +---------------------+---------------------+---------------------+
> // |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
> // +---------------------+---------------------+---------------------+
> df.select(unix_timestamp(df("date")) as "date", 
> unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc", 
> unix_timestamp(to_utc_timestamp(df("date"),  "GMT+01:00")) as 
> "to_utc").show(1, false)
> // +----------+----------+----------+
> // |date      |from_utc  |to_utc    |
> // +----------+----------+----------+
> // |1500000000|1500003600|1499996400|
> // +----------+----------+----------+
> {code}
> So, if interpret correctly, {{from_utc_timestamp}} took {{02:40 UTC}} 
> interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to 
> {{03:40 UTC}}. However the description of {{from_utc_timestamp}} says
> bq. Given a timestamp, which corresponds to a certain time of day in UTC, 
> returns another timestamp that corresponds to the same time of day in the 
> given timezone. 
> I would have then expected that the function take {{02:40 UTC}} and return 
> {{02:40 GMT+1 = 01:40 UTC}}. In fact, I think the descriptions of 
> {{from_utc_timestamp}} and {{to_utc_timestamp}} are inverted.
> I am interpreting this right?
> Thanks in advance
> Felipe



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to