Re: WriterOptions.writerVersion(version)?

Owen O'Malley Fri, 01 Mar 2019 15:29:04 -0800

The goal of WriterVersion is to record changes to the writer software so
that the readers can cope with unknown bugs. It is not intended to mark
format changes. A good example of this is when we switched from the
row-by-row writer to the vectorized writer in HIVE-12055. This changed the
implementation of the writer, but didn't change the format. If the change
had introduced a bug, we'd know that the reader had to compensate.


If the older versions of Hive are broken with higher writer versions, we
absolutely should fix that. I seem to remember fixing that at some point,
but I probably didn't push it back into Hive 1.x. Which version did you see
the problem?

.. Owen

On Wed, Feb 27, 2019 at 9:43 AM Dain Sundstrom <[email protected]> wrote:

> Hi, we recently updated to Hive 3.0+ and have noticed some issues with
> older versions of Hive being able to read data written by newer versions of
> ORC.  Specifically, older readers only understand writer version up to 4
> and newer versions write 6.  This causes older readers to fail.  I see that
> the workaround is to set
> `WriterOptions.writerVersion(WriterVersion.HIVE_13083)`, which causes the
> writer to put a `4` in the postscript, but doesn’t seem to change anything
> else in the writer’s behavior.  My question is, did I miss something gin
> the writer where behavior changes based on version?  If not, does that
> work?  I ask because newer versions have comments like `ORC_135(6) =>
> timestamp stats use utc`, which to me would seem to require that the
> behavior changes.
>
> Thanks,
>
> -dain
>
> ----
> Dain Sundstrom
> Co-founder @ Presto Software Foundation, Co-creator of Presto (
> https://prestosql.io)
>
>

Re: WriterOptions.writerVersion(version)?

Reply via email to