[jira] [Commented] (HIVE-22006) Hive parquet timestamp compatibility, part 2

Karen Coppage (Jira) Fri, 25 Oct 2019 00:11:49 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-22006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16959495#comment-16959495
 ]


Karen Coppage commented on HIVE-22006:
--------------------------------------

Hi [~h-vetinari],

Unfortunately introducing a switch (and turning it on) or simply changing 
timestamp writing to time zone agnostic would make all previously written 
timestamp data unusable.

[~kuczoram] and I have worked on patches (HIVE-21050, HIVE-21215, HIVE-21216) 
that would introduce the option to store Parquet timestamps in a logical type 
that includes metadata indicating that the timestamp is time zone agnostic, 
without breaking backwards compatibility (Hive would correctly read previously 
written timestamps). Sadly, we cannot commit these patches until Parquet 1.11 
is released. Impala also has an implementation for this waiting in the wings. 
If Parquet 1.11 were to be released, and Spark were to also implement the 
feature, then Hive/Impala/Spark could safely work on the same Parquet data, as 
you said.

I'm not sure about ORC. Timestamps stored as text have always been time zone 
agnostic.

tl;dr, there is a backwards compatible solution for Parquet; it's currently 
blocked by the Parquet community.

> Hive parquet timestamp compatibility, part 2
> --------------------------------------------
>
>                 Key: HIVE-22006
>                 URL: https://issues.apache.org/jira/browse/HIVE-22006
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: All Versions
>            Reporter: H. Vetinari
>            Priority: Major
>
> The interaction between HIVE / IMPALA / SPARK writing timestamps is a major 
> source of headaches in every scenario where such interaction cannot be 
> avoided.
> HIVE-9482 added hive.parquet.timestamp.skip.conversion, which *only* affects 
> the *reading* of timestamps.
> It formulates the next steps as:
> > Later fix will change the write path to not convert, and stop the 
> > read-conversion even for files written by Hive itself.
> At the very least, HIVE needs a switch to also turn off the conversion on 
> writes. That would at least allow a setup where all three of HIVE / IMPALA / 
> SPARK can be configured not to convert on read/write, and can hence safely 
> work on the same data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-22006) Hive parquet timestamp compatibility, part 2

Reply via email to