Felipe Olmos created SPARK-22049:
------------------------------------

             Summary: Confusing behavior of from_utc_timestamp and 
to_utc_timestamp
                 Key: SPARK-22049
                 URL: https://issues.apache.org/jira/browse/SPARK-22049
             Project: Spark
          Issue Type: Question
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Felipe Olmos
            Priority: Minor


Hello everyone,

I am confused about the behavior of the functions {{from_utc_timestamp}} and 
{{to_utc_timestamp}}. As an example, take the following code to a spark shell

{code:java}
import java.sql.Timestamp 
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

// 2017-07-14 02:40 UTC
val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
val df = spark.createDataFrame(rdd, StructType(StructField("date", 
TimestampType) :: Nil))

df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as 
"from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1, 
false)

// Date format printing is dependent on the timezone of the machine. 
// The following is in UTC
// +---------------------+---------------------+---------------------+          
   
// |date                 |from_utc             |to_utc               |
// +---------------------+---------------------+---------------------+
// |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
// +---------------------+---------------------+---------------------+

df.select(unix_timestamp(df("date")) as "date", 
unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc", 
unix_timestamp(to_utc_timestamp(df("date"),  "GMT+01:00")) as "to_utc").show(1, 
false)
// +----------+----------+----------+
// |date      |from_utc  |to_utc    |
// +----------+----------+----------+
// |1500000000|1500003600|1499996400|
// +----------+----------+----------+
{code}

So if interpret correctly {{from_utc_timestamp}}, took {{02:40 UTC}} 
interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to 
{{03:40 UTC}}. However the description of {{from_utc_timestamp}} says

bq. Given a timestamp, which corresponds to a certain time of day in UTC, 
returns another timestamp that corresponds to the same time of day in the given 
timezone. 

I would have expected then that the function takes {{02:40 UTC}} and return 
{{02:40 GMT+1 = 01:40 UTC}}. In fact the descriptions of {{from_utc_timestamp}} 
and {{to_utc_timestamp}} seem inverted.

I am interpreting this right?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to