Hi, > > Let me make sure I have the backwards compatibility straight. If a user > > switches to ORC 2.0, he could choose to continue writing in older formats > > so that his old tools could read it > > Yes, exactly.
To chime in on Owen's point, the development process has a slight wrinkle in it, which we avoided in the 0.11 -> 0.12 migration due to ORC being embedded in Hive. The feature addition is two-fold - the new features are available only when a user flips the writer versions. There is no feature flag for reader versions, so the readers have to keep up to date with the writer changes (or just fail for the "blackholed" ones, with good errors). Due to the split between projects, I expect to see a two-step development cycle, to clean up the integration pathways before the ABI is frozen in 2.0. The entire process can be gated on the writer version - during the development process, there will be an experimental version (1.5?) and a stable version. I have no interest in ever supporting an actual 1.5 version data setup in ORC, but for the sake of integration testing the 1.5->2.0 writer versions are extremely useful stepping stones towards a multi-project dependency like ORC. Once the integrations are all complete and the format can be frozen, ORC 2.0 releases can still disable the default writer version from being upgraded for another stable release. After the ecosystem has had all its upgrades, the default version gets flipped to 2.0, while the ability to write 0.12 files will still remain as an option, while all intermediate reader versions will get dropped. That's a bit more complicated than being part of Hive and sync'ing releases, but I think this gives ORC the flexibility to accept contributions from a wide community, supporting multi-project release timelines, without leaving the implementation full of reader implementations for many writer versions. Cheers, Gopal