Hi,

> > Let me make sure I have the backwards compatibility straight.  If a user
> > switches to ORC 2.0, he could choose to continue writing in older formats
> > so that his old tools could read it
>
>    Yes, exactly.

To chime in on Owen's point, the development process has a slight wrinkle in 
it, which we avoided in the 0.11 -> 0.12 migration due to ORC being embedded in 
Hive.

The feature addition is two-fold - the new features are available only when a 
user flips the writer versions.

There is no feature flag for reader versions, so the readers have to keep up to 
date with the writer changes (or just fail for the "blackholed" ones, with good 
errors).

Due to the split between projects, I expect to see a two-step development 
cycle, to clean up the integration pathways before the ABI is frozen in 2.0.

The entire process can be gated on the writer version - during the development 
process, there will be an experimental version (1.5?) and a stable version.

I have no interest in ever supporting an actual 1.5 version data setup in ORC, 
but for the sake of integration testing the 1.5->2.0 writer versions are 
extremely useful stepping stones towards a multi-project dependency like ORC.

Once the integrations are all complete and the format can be frozen, ORC 2.0 
releases can still disable the default writer version from being upgraded for 
another stable release.

After the ecosystem has had all its upgrades, the default version gets flipped 
to 2.0, while the ability to write 0.12 files will still remain as an option, 
while all intermediate reader versions will get dropped.

That's a bit more complicated than being part of Hive and sync'ing releases, 
but I think this gives ORC the flexibility to accept contributions from a wide 
community, supporting multi-project release timelines, without leaving the 
implementation full of reader implementations for many writer versions.

Cheers,
Gopal 


Reply via email to