[
https://issues.apache.org/jira/browse/HIVE-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369333#comment-17369333
]
Stamatis Zampetakis commented on HIVE-25104:
--------------------------------------------
[~gupta.nikhil0007] Thanks for putting this list together.
Personally, when I am looking for existing issues I rely mostly on the JQL
search (e.g.,
[query|https://issues.apache.org/jira/issues/?jql=project%20%3D%20hive%20and%20labels%20in%20(timestamp%2Cdatetime)%20order%20by%20created%20DESC])
and not so much in umbrella issues. I use umbrella issues for tightly
connected tasks which I need to track progress. In the list you outlined above,
most of the tasks are already resolved so it seems that in this case the
umbrella is purely for classification purposes. If you want to create an
umbrella ticket feel free to do so.
>From my end, I will do a pass over the tickets you mentioned and add some
>labels (e.g., timestamp, datetime, compatibility) if they are not already
>there. Another thing maybe worth doing is add a JQL query in the wiki to help
>people in finding date, time, and timestamp related issues.
> Backward incompatible timestamp serialization in Parquet for certain timezones
> ------------------------------------------------------------------------------
>
> Key: HIVE-25104
> URL: https://issues.apache.org/jira/browse/HIVE-25104
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 3.1.0
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: compatibility, pull-request-available, timestamp
> Fix For: 4.0.0
>
> Time Spent: 2h
> Remaining Estimate: 0h
>
> HIVE-12192, HIVE-20007 changed the way that timestamp computations are
> performed and to some extend how timestamps are serialized and deserialized
> in files (Parquet, Avro).
> In versions that include HIVE-12192 or HIVE-20007 the serialization in
> Parquet files is not backwards compatible. In other words writing timestamps
> with a version of Hive that includes HIVE-12192/HIVE-20007 and reading them
> with another (not including the previous issues) may lead to different
> results depending on the default timezone of the system.
> Consider the following scenario where the default system timezone is set to
> US/Pacific.
> At apache/master commit 37f13b02dff94e310d77febd60f93d5a205254d3
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
> LOCATION '/tmp/hiveexttbl/employee';
> INSERT INTO employee VALUES (1, '1880-01-01 00:00:00');
> INSERT INTO employee VALUES (2, '1884-01-01 00:00:00');
> INSERT INTO employee VALUES (3, '1990-01-01 00:00:00');
> SELECT * FROM employee;
> {code}
> |1|1880-01-01 00:00:00|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> At apache/branch-2.3 commit 324f9faf12d4b91a9359391810cb3312c004d356
> {code:sql}
> CREATE EXTERNAL TABLE employee(eid INT,birth timestamp) STORED AS PARQUET
> LOCATION '/tmp/hiveexttbl/employee';
> SELECT * FROM employee;
> {code}
> |1|1879-12-31 23:52:58|
> |2|1884-01-01 00:00:00|
> |3|1990-01-01 00:00:00|
> The timestamp for {{eid=1}} in branch-2.3 is different from the one in master.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)