[ 
https://issues.apache.org/jira/browse/HIVE-20980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740613#comment-16740613
 ] 

Jesus Camacho Rodriguez commented on HIVE-20980:
------------------------------------------------

[~klcopp], thanks for your patch. I do not think we have reached an agreement 
on how to move forward concerning timestamp types (at least in Hive). What I do 
not like about the current proposal is that we still have different semantics 
(local date time vs instant) for the same type (timestamp) depending on the 
storage format used for the table (e.g text/orc vs parquet/avro). This seems 
quite shaky moving forward. In turn, I already described my concerns about 
having 4 types where 'timestamp without time zone' does not have same semantics 
as 'timestamp'.

The patch is reimplementing 'timestamp with local time zone' semantics into a 
'timestamp' and it even relies on the session time zone instead of relying on 
the system time zone, which is something that was not done before AFAIK.
Instead of following that path, is it possible to provide an upgrade path from 
< 3.x to 3.x where the column type for tables stored using Parquet is altered 
when we upgrade ('timestamp' -> 'timestamp with local time zone')? Then Parquet 
writer/reader can choose how to store the 'timestamp with local time zone' type 
internally, e.g., if it wants to remain compatible with legacy readers, it 
could choose to store it as a timestamp. That would provide consistent 
semantics moving forward as well as backwards compatibility, albeit DDL 
statements created before version 3.x will need to be modified if instant 
semantics are required. Is that reasonable? Is there anything I am missing?
Cc [~zi] [~owen.omalley]

> Reinstate Parquet timestamp conversion between HS2 time zone and UTC
> --------------------------------------------------------------------
>
>                 Key: HIVE-20980
>                 URL: https://issues.apache.org/jira/browse/HIVE-20980
>             Project: Hive
>          Issue Type: Sub-task
>          Components: File Formats
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>         Attachments: HIVE-20980.1.patch, HIVE-20980.2.patch, 
> HIVE-20980.2.patch
>
>
> With HIVE-20007, Parquet timestamps became timezone-agnostic. This means that 
> timestamps written after the change are read exactly as they were written; 
> but timestamps stored before this change are effectively converted from the 
> writing HS2 server time zone to GMT time zone. This patch reinstates the 
> original behavior: timestamps are converted to UTC before write and from UTC 
> before read.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to