On Fri, Aug 4, 2017 at 12:15 PM, Alan Gates <[email protected]> wrote:
> Let me make sure I have the backwards compatibility straight. If a user > switches to ORC 2.0, he could choose to continue writing in older formats > so that his old tools could read it. Then once all his tools are upgraded > he could throw a config switch and new data would be written in the new > format. Once that switch was thrown, any pre-ORC 2.0 tools would be > unusable. Before throwing that switch, he would get none of the benefits > of ORC 2.0. Is this summary correct? > Yes, exactly. > > If so, I agree we should do this. The list of potential benefits for > performance and space efficiency is compelling. And the long lag for users > with many old tools to upgrade will never get better. > > Alan. > > On Fri, Aug 4, 2017 at 9:29 AM, Owen O'Malley <[email protected]> > wrote: > > > All, > > We've started the process of updating the encodings for ORC. These > > changes are going to extend the format in ways that aren't forward > > compatible. (eg. The ORC 1.4 readers won't be able to read the new > format.) > > > > The changes that I've heard about are: > > * Decimal encoding - this will like be separated in to two categories > > + precision <= 18 > > + precision > 18 > > In both cases the precision and scale will be fixed for the entire file > > rather than per value. > > * a new Float/Double encoding > > * a new RLE encoding > > > > Are there other encodings that we should consider adding? > > > > We haven't made forward incompatible changes in a while. Currently the > ORC > > Writer can write either: > > * Hive 0.11 ORC files > > * Hive 0.12 ORC files > > > > So I'd like to propose that we add a new ORC 2.0 file version and all of > > these changes need to be so tagged. > > > > The new ORC writers will maintain the ability to write the old versions > of > > the files (Hive 0.11 ORC and Hive 0.12 ORC) as well as the ORC 2.0 files. > > The new reader will automatically read all three versions. > > > > Thoughts? > > > > Owen > > >
