[
https://issues.apache.org/jira/browse/HIVE-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742233#comment-16742233
]
Zoltan Ivanfi commented on HIVE-20980:
--------------------------------------
[~jcamachorodriguez] The addition of session-local time zones was orthogonal to
the semantics change and it seemed to make sense to restore the timezone-aware
semantics based on the session-local time zone rather than the server time
zone. That being said, I do not have a strong preference towards either one, so
if you prefer one over the other, we are fine with your choice.
There is an isAdjustedToUTC parameter in parquet-format indeed, which will be
made available in the upcoming parquet-mr 1.11.0 release. It is also one of the
reasons why I would prefer the TIMESTAMP and TIMESTAMP WITHOUT TIME ZONE types
to behave differently for Parquet. The isAdjustedToUTC annotates int64
timestamps, while previously we used int96 timestamps. Writing int64 timestamps
is a breaking change in itself, so it should only be done at the user's
explicit request. However, a configuration switch would not suffice for this
purpose, because the necessity of writing backwards-compatible int96 timestamp
for any single table would prevent every other table from using the new int64
timestamps as well.
At the same time, introducing new semantics for timestamps breaks the existing
rule that an int96 written by Impala is LocalDateTime but an int96 written by
Hive or Spark is Instant. To prevent further confusion, the new semantics
should never be written into int96 timestamps, only int64 ones, because the
latter allow saving semantics metadata in the isAdjustedToUTC type parameter.
Handling the old TIMESTAMP type behave in the legacy way and writing only int64
timestamps with new TIMESTAMP WITH LOCAL TIME ZONE type resolves these two
problems in a nice way. (Please see [this
appendix|https://docs.google.com/document/d/1gNRww9mZJcHvUDCXklzjFEQGpefsuR_akCDfWsdE35Q/edit#heading=h.gonr2yqv3e77]
of the proposal for details.) It is true that TIMESTAMP will behave
differently between different file formats again, but that inconsisteny has
historically been a part of Hive and fixing that would be a breaking change.
> Reinstate Parquet timestamp conversion between HS2 time zone and UTC
> --------------------------------------------------------------------
>
> Key: HIVE-20980
> URL: https://issues.apache.org/jira/browse/HIVE-20980
> Project: Hive
> Issue Type: Sub-task
> Components: File Formats
> Reporter: Karen Coppage
> Assignee: Karen Coppage
> Priority: Major
> Attachments: HIVE-20980.1.patch, HIVE-20980.2.patch,
> HIVE-20980.2.patch
>
>
> With HIVE-20007, Parquet timestamps became timezone-agnostic. This means that
> timestamps written after the change are read exactly as they were written;
> but timestamps stored before this change are effectively converted from the
> writing HS2 server time zone to GMT time zone. This patch reinstates the
> original behavior: timestamps are converted to UTC before write and from UTC
> before read.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)