[ 
https://issues.apache.org/jira/browse/SPARK-32123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32123:
------------------------------------

    Assignee:     (was: Apache Spark)

> [Python] Setting `spark.sql.session.timeZone` only partially respected
> ----------------------------------------------------------------------
>
>                 Key: SPARK-32123
>                 URL: https://issues.apache.org/jira/browse/SPARK-32123
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.0.0
>            Reporter: Toby Harradine
>            Priority: Major
>
> Reopening SPARK-25244 as it is unresolved as of versions 2.4.6 and 3.0.0.
> The setting {{spark.sql.session.timeZone}} is respected by PySpark when 
> converting from and to Pandas, as described 
> [here|http://spark.apache.org/docs/latest/sql-programming-guide.html#timestamp-with-time-zone-semantics].
>  However, when timestamps are converted directly to Pythons {{datetime}} 
> objects, its ignored and the systems timezone is used.
> This can be checked by the following code snippet
> {code:java}
> import pyspark.sql
> spark = (pyspark
>          .sql
>          .SparkSession
>          .builder
>          .master('local[1]')
>          .config("spark.sql.session.timeZone", "UTC")
>          .getOrCreate()
>         )
> df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
> df = df.withColumn("ts", df["ts"].astype("timestamp"))
> print(df.toPandas().iloc[0,0])
> print(df.collect()[0][0])
> {code}
> Which for me prints (the exact result depends on the timezone of your system, 
> mine is Europe/Berlin)
> {code:java}
> 2018-06-01 01:00:00
> 2018-06-01 03:00:00
> {code}
> Hence, the method {{toPandas}} respected the timezone setting (UTC), but the 
> method {{collect}} ignored it and converted the timestamp to my systems 
> timezone.
> The cause for this behaviour is that the methods {{toInternal}} and 
> {{fromInternal}} of PySparks {{TimestampType}} class don't take into account 
> the setting {{spark.sql.session.timeZone}} and use the system timezone.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to