[ https://issues.apache.org/jira/browse/SPARK-25244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147464#comment-17147464 ]
Toby Harradine commented on SPARK-25244: ---------------------------------------- Thanks for letting me know. I've just created SPARK-32123 which marks affected version as 3.0.0. > [Python] Setting `spark.sql.session.timeZone` only partially respected > ---------------------------------------------------------------------- > > Key: SPARK-25244 > URL: https://issues.apache.org/jira/browse/SPARK-25244 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.3.1 > Reporter: Anton Daitche > Priority: Major > Labels: bulk-closed > > The setting `spark.sql.session.timeZone` is respected by PySpark when > converting from and to Pandas, as described > [here|http://spark.apache.org/docs/latest/sql-programming-guide.html#timestamp-with-time-zone-semantics]. > However, when timestamps are converted directly to Pythons `datetime` > objects, its ignored and the systems timezone is used. > This can be checked by the following code snippet > {code:java} > import pyspark.sql > spark = (pyspark > .sql > .SparkSession > .builder > .master('local[1]') > .config("spark.sql.session.timeZone", "UTC") > .getOrCreate() > ) > df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"]) > df = df.withColumn("ts", df["ts"].astype("timestamp")) > print(df.toPandas().iloc[0,0]) > print(df.collect()[0][0]) > {code} > Which for me prints (the exact result depends on the timezone of your system, > mine is Europe/Berlin) > {code:java} > 2018-06-01 01:00:00 > 2018-06-01 03:00:00 > {code} > Hence, the method `toPandas` respected the timezone setting (UTC), but the > method `collect` ignored it and converted the timestamp to my systems > timezone. > The cause for this behaviour is that the methods `toInternal` and > `fromInternal` of PySparks `TimestampType` class don't take into account the > setting `spark.sql.session.timeZone` and use the system timezone. > If the maintainers agree that this should be fixed, I would try to come up > with a patch. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org