[ 
https://issues.apache.org/jira/browse/SPARK-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147464#comment-17147464
 ] 

Toby Harradine commented on SPARK-25244:
----------------------------------------

Thanks for letting me know.

I've just created SPARK-32123 which marks affected version as 3.0.0.

> [Python] Setting `spark.sql.session.timeZone` only partially respected
> ----------------------------------------------------------------------
>
>                 Key: SPARK-25244
>                 URL: https://issues.apache.org/jira/browse/SPARK-25244
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.1
>            Reporter: Anton Daitche
>            Priority: Major
>              Labels: bulk-closed
>
> The setting `spark.sql.session.timeZone` is respected by PySpark when 
> converting from and to Pandas, as described 
> [here|http://spark.apache.org/docs/latest/sql-programming-guide.html#timestamp-with-time-zone-semantics].
>  However, when timestamps are converted directly to Pythons `datetime` 
> objects, its ignored and the systems timezone is used.
> This can be checked by the following code snippet
> {code:java}
> import pyspark.sql
> spark = (pyspark
>          .sql
>          .SparkSession
>          .builder
>          .master('local[1]')
>          .config("spark.sql.session.timeZone", "UTC")
>          .getOrCreate()
>         )
> df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
> df = df.withColumn("ts", df["ts"].astype("timestamp"))
> print(df.toPandas().iloc[0,0])
> print(df.collect()[0][0])
> {code}
> Which for me prints (the exact result depends on the timezone of your system, 
> mine is Europe/Berlin)
> {code:java}
> 2018-06-01 01:00:00
> 2018-06-01 03:00:00
> {code}
> Hence, the method `toPandas` respected the timezone setting (UTC), but the 
> method `collect` ignored it and converted the timestamp to my systems 
> timezone.
> The cause for this behaviour is that the methods `toInternal` and 
> `fromInternal` of PySparks `TimestampType` class don't take into account the 
> setting `spark.sql.session.timeZone` and use the system timezone.
> If the maintainers agree that this should be fixed, I would try to come up 
> with a patch. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to