The goal of WriterVersion is to record changes to the writer software so that the readers can cope with unknown bugs. It is not intended to mark format changes. A good example of this is when we switched from the row-by-row writer to the vectorized writer in HIVE-12055. This changed the implementation of the writer, but didn't change the format. If the change had introduced a bug, we'd know that the reader had to compensate.
If the older versions of Hive are broken with higher writer versions, we absolutely should fix that. I seem to remember fixing that at some point, but I probably didn't push it back into Hive 1.x. Which version did you see the problem? .. Owen On Wed, Feb 27, 2019 at 9:43 AM Dain Sundstrom <[email protected]> wrote: > Hi, we recently updated to Hive 3.0+ and have noticed some issues with > older versions of Hive being able to read data written by newer versions of > ORC. Specifically, older readers only understand writer version up to 4 > and newer versions write 6. This causes older readers to fail. I see that > the workaround is to set > `WriterOptions.writerVersion(WriterVersion.HIVE_13083)`, which causes the > writer to put a `4` in the postscript, but doesn’t seem to change anything > else in the writer’s behavior. My question is, did I miss something gin > the writer where behavior changes based on version? If not, does that > work? I ask because newer versions have comments like `ORC_135(6) => > timestamp stats use utc`, which to me would seem to require that the > behavior changes. > > Thanks, > > -dain > > ---- > Dain Sundstrom > Co-founder @ Presto Software Foundation, Co-creator of Presto ( > https://prestosql.io) > >
