[
https://issues.apache.org/jira/browse/SPARK-22049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Felipe Olmos updated SPARK-22049:
---------------------------------
Description:
Hello everyone,
I am confused about the behavior of the functions {{from_utc_timestamp}} and
{{to_utc_timestamp}}. As an example, take the following code to a spark shell
{code:java}
import java.sql.Timestamp
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
// 2017-07-14 02:40 UTC
val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
val df = spark.createDataFrame(rdd, StructType(StructField("date",
TimestampType) :: Nil))
df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as
"from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1,
false)
// Date format printing is dependent on the timezone of the machine.
// The following is in UTC
// +---------------------+---------------------+---------------------+
// |date |from_utc |to_utc |
// +---------------------+---------------------+---------------------+
// |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
// +---------------------+---------------------+---------------------+
df.select(unix_timestamp(df("date")) as "date",
unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc",
unix_timestamp(to_utc_timestamp(df("date"), "GMT+01:00")) as "to_utc").show(1,
false)
// +----------+----------+----------+
// |date |from_utc |to_utc |
// +----------+----------+----------+
// |1500000000|1500003600|1499996400|
// +----------+----------+----------+
{code}
So, if interpret correctly, {{from_utc_timestamp}} took {{02:40 UTC}}
interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to
{{03:40 UTC}}. However the description of {{from_utc_timestamp}} says
bq. Given a timestamp, which corresponds to a certain time of day in UTC,
returns another timestamp that corresponds to the same time of day in the given
timezone.
I would have then expected that the function take {{02:40 UTC}} and return
{{02:40 GMT+1 = 01:40 UTC}}. In fact, I think the descriptions of
{{from_utc_timestamp}} and {{to_utc_timestamp}} are inverted.
I am interpreting this right?
Thanks in advance
Felipe
was:
Hello everyone,
I am confused about the behavior of the functions {{from_utc_timestamp}} and
{{to_utc_timestamp}}. As an example, take the following code to a spark shell
{code:java}
import java.sql.Timestamp
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
// 2017-07-14 02:40 UTC
val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
val df = spark.createDataFrame(rdd, StructType(StructField("date",
TimestampType) :: Nil))
df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as
"from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1,
false)
// Date format printing is dependent on the timezone of the machine.
// The following is in UTC
// +---------------------+---------------------+---------------------+
// |date |from_utc |to_utc |
// +---------------------+---------------------+---------------------+
// |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
// +---------------------+---------------------+---------------------+
df.select(unix_timestamp(df("date")) as "date",
unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc",
unix_timestamp(to_utc_timestamp(df("date"), "GMT+01:00")) as "to_utc").show(1,
false)
// +----------+----------+----------+
// |date |from_utc |to_utc |
// +----------+----------+----------+
// |1500000000|1500003600|1499996400|
// +----------+----------+----------+
{code}
So, if interpret correctly, {{from_utc_timestamp}} took {{02:40 UTC}}
interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to
{{03:40 UTC}}. However the description of {{from_utc_timestamp}} says
bq. Given a timestamp, which corresponds to a certain time of day in UTC,
returns another timestamp that corresponds to the same time of day in the given
timezone.
I would have expected then that the function takes {{02:40 UTC}} and return
{{02:40 GMT+1 = 01:40 UTC}}. In fact the descriptions of {{from_utc_timestamp}}
and {{to_utc_timestamp}} seem inverted.
I am interpreting this right?
Thanks in advance
Felipe
> Confusing behavior of from_utc_timestamp and to_utc_timestamp
> -------------------------------------------------------------
>
> Key: SPARK-22049
> URL: https://issues.apache.org/jira/browse/SPARK-22049
> Project: Spark
> Issue Type: Question
> Components: SQL
> Affects Versions: 2.1.0
> Reporter: Felipe Olmos
> Priority: Minor
> Labels: spark-sql
>
> Hello everyone,
> I am confused about the behavior of the functions {{from_utc_timestamp}} and
> {{to_utc_timestamp}}. As an example, take the following code to a spark shell
> {code:java}
> import java.sql.Timestamp
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> // 2017-07-14 02:40 UTC
> val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
> val df = spark.createDataFrame(rdd, StructType(StructField("date",
> TimestampType) :: Nil))
> df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as
> "from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1,
> false)
> // Date format printing is dependent on the timezone of the machine.
> // The following is in UTC
> // +---------------------+---------------------+---------------------+
>
> // |date |from_utc |to_utc |
> // +---------------------+---------------------+---------------------+
> // |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
> // +---------------------+---------------------+---------------------+
> df.select(unix_timestamp(df("date")) as "date",
> unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc",
> unix_timestamp(to_utc_timestamp(df("date"), "GMT+01:00")) as
> "to_utc").show(1, false)
> // +----------+----------+----------+
> // |date |from_utc |to_utc |
> // +----------+----------+----------+
> // |1500000000|1500003600|1499996400|
> // +----------+----------+----------+
> {code}
> So, if interpret correctly, {{from_utc_timestamp}} took {{02:40 UTC}}
> interpreted it as {{03:40 GMT+1}} (same timestamp) and transformed it to
> {{03:40 UTC}}. However the description of {{from_utc_timestamp}} says
> bq. Given a timestamp, which corresponds to a certain time of day in UTC,
> returns another timestamp that corresponds to the same time of day in the
> given timezone.
> I would have then expected that the function take {{02:40 UTC}} and return
> {{02:40 GMT+1 = 01:40 UTC}}. In fact, I think the descriptions of
> {{from_utc_timestamp}} and {{to_utc_timestamp}} are inverted.
> I am interpreting this right?
> Thanks in advance
> Felipe
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]