Ok, I created ORC-229 https://issues.apache.org/jira/browse/ORC-229 so that
we'll have a new OrcFile.Version of UNSTABLE-PRE-2.0. If you look at the
associated pull request, you can see the comments in the code are pretty
clear that users should stay away. I also added a logged warning when the
writer uses that version.

My intention is that we can iterate on the UNSTABLE-PRE-2.0 format without
cross-version compatibility. It will only be used for developer testing. As
part of the ORC 2.0 release, we can delete that version and move to a new
2.0 version.

Thoughts?

.. Owen

On Tue, Aug 8, 2017 at 12:13 AM, Gopal Vijayaraghavan <[email protected]>
wrote:

> Hi,
>
> > > Let me make sure I have the backwards compatibility straight.  If a
> user
> > > switches to ORC 2.0, he could choose to continue writing in older
> formats
> > > so that his old tools could read it
> >
> >    Yes, exactly.
>
> To chime in on Owen's point, the development process has a slight wrinkle
> in it, which we avoided in the 0.11 -> 0.12 migration due to ORC being
> embedded in Hive.
>
> The feature addition is two-fold - the new features are available only
> when a user flips the writer versions.
>
> There is no feature flag for reader versions, so the readers have to keep
> up to date with the writer changes (or just fail for the "blackholed" ones,
> with good errors).
>
> Due to the split between projects, I expect to see a two-step development
> cycle, to clean up the integration pathways before the ABI is frozen in 2.0.
>
> The entire process can be gated on the writer version - during the
> development process, there will be an experimental version (1.5?) and a
> stable version.
>
> I have no interest in ever supporting an actual 1.5 version data setup in
> ORC, but for the sake of integration testing the 1.5->2.0 writer versions
> are extremely useful stepping stones towards a multi-project dependency
> like ORC.
>
> Once the integrations are all complete and the format can be frozen, ORC
> 2.0 releases can still disable the default writer version from being
> upgraded for another stable release.
>
> After the ecosystem has had all its upgrades, the default version gets
> flipped to 2.0, while the ability to write 0.12 files will still remain as
> an option, while all intermediate reader versions will get dropped.
>
> That's a bit more complicated than being part of Hive and sync'ing
> releases, but I think this gives ORC the flexibility to accept
> contributions from a wide community, supporting multi-project release
> timelines, without leaving the implementation full of reader
> implementations for many writer versions.
>
> Cheers,
> Gopal
>
>
>

Reply via email to