[
https://issues.apache.org/jira/browse/SPARK-51734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Abhinav Koul updated SPARK-51734:
---------------------------------
Priority: Major (was: Minor)
> Wrong results when reading ORC Timestamp type with different Reader/Writer
> Timezones
> ------------------------------------------------------------------------------------
>
> Key: SPARK-51734
> URL: https://issues.apache.org/jira/browse/SPARK-51734
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.5.1
> Reporter: Abhinav Koul
> Priority: Major
>
> When reading ORC TimestampLTZ (Timestamp with local timezone) spark returns
> incorrect values if Reader and Writer timezones are different.
> How to Replicate:
> {code:java}
> TimeZone.setDefault(TimeZone.getTimeZone("Europe/Berlin"))
> sql("SET spark.sql.session.timeZone = Europe/Berlin")
> sql("DROP TABLE IF EXISTS t")
> sql("CREATE TABLE t (tz TIMESTAMP) USING hive OPTIONS(fileFormat 'orc')")
> sql("INSERT INTO t VALUES (TIMESTAMP('1996-08-02 09:00:00.723100809'))")
> TimeZone.setDefault(TimeZone.getTimeZone("Asia/Kolkata"))
> sql("SET spark.sql.session.timeZone = Asia/Kolkata")
> spark.table("t").collect() {code}
> On analysing the above query results with parquet I found the following:
> || ||Parquet(ms)||Orc(ms)||Parquet (Timestamp)||Orc (Timestamp)||
> |Spark to Fileformat Writer|838969200723|838969200723|1996-08-02
> 09:00:00.723100809|1996-08-02 09:00:00.723100809|
> |Fileformat Reader to Spark|838969200723|838956600723|1996-08-02
> 12:30:00.723100809|1996-08-02 09:00:00.723100809|
> Inside ORC reader I found that ORC did read correct millisecond value of
> 838969200723 but purposefully adds WriterTZ - ReaderTZ offset to it
> (-12600000 ms about -3hrs 30mins).
> What parquet does seems to be correct according to my understanding where
> Timestamp should be adjusted to corresponding timezone and should not show
> the same time like ORC's current behaviour. Please suggest what can be done
> further here.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]