[
https://issues.apache.org/jira/browse/HIVE-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stamatis Zampetakis updated HIVE-21291:
---------------------------------------
Labels: compatibility timestamp (was: )
> Restore historical way of handling timestamps in Avro while keeping the new
> semantics at the same time
> ------------------------------------------------------------------------------------------------------
>
> Key: HIVE-21291
> URL: https://issues.apache.org/jira/browse/HIVE-21291
> Project: Hive
> Issue Type: Sub-task
> Reporter: Zoltan Ivanfi
> Assignee: Karen Coppage
> Priority: Major
> Labels: compatibility, timestamp
> Fix For: 3.1.2, 3.2.0, 4.0.0
>
> Attachments: HIVE-21291.1.patch, HIVE-21291.2.patch,
> HIVE-21291.3.patch, HIVE-21291.4.patch, HIVE-21291.4.patch,
> HIVE-21291.5.patch, HIVE-21291.6.patch, HIVE-21291.7.patch,
> HIVE-21291.7.patch, HIVE-21291.branch-3.1.patch, HIVE-21291.branch-3.patch
>
>
> This sub-task is for implementing the Avro-specific parts of the following
> plan:
> h1. Problem
> Historically, the semantics of the TIMESTAMP type in Hive depended on the
> file format. Timestamps in Avro, Parquet and RCFiles with a binary SerDe had
> _Instant_ semantics, while timestamps in ORC, textfiles and RCFiles with a
> text SerDe had _LocalDateTime_ semantics.
> The Hive community wanted to get rid of this inconsistency and have
> _LocalDateTime_ semantics in Avro, Parquet and RCFiles with a binary SerDe as
> well. *Hive 3.1 turned off normalization to UTC* to achieve this. While this
> leads to the desired new semantics, it also leads to incorrect results when
> new Hive versions read timestamps written by old Hive versions or when old
> Hive versions or any other component not aware of this change (including
> legacy Impala and Spark versions) read timestamps written by new Hive
> versions.
> h1. Solution
> To work around this issue, Hive *should restore the practice of normalizing
> to UTC* when writing timestamps to Avro, Parquet and RCFiles with a binary
> SerDe. In itself, this would restore the historical _Instant_ semantics,
> which is undesirable. In order to achieve the desired _LocalDateTime_
> semantics in spite of normalizing to UTC, newer Hive versions should record
> the session-local local time zone in the file metadata fields serving
> arbitrary key-value storage purposes.
> When reading back files with this time zone metadata, newer Hive versions (or
> any other new component aware of this extra metadata) can achieve
> _LocalDateTime_ semantics by *converting from UTC to the saved time zone
> (instead of to the local time zone)*. Legacy components that are unaware of
> the new metadata can read the files without any problem and the timestamps
> will show the historical Instant behaviour to them.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)