All, We've started the process of updating the encodings for ORC. These changes are going to extend the format in ways that aren't forward compatible. (eg. The ORC 1.4 readers won't be able to read the new format.)
The changes that I've heard about are: * Decimal encoding - this will like be separated in to two categories + precision <= 18 + precision > 18 In both cases the precision and scale will be fixed for the entire file rather than per value. * a new Float/Double encoding * a new RLE encoding Are there other encodings that we should consider adding? We haven't made forward incompatible changes in a while. Currently the ORC Writer can write either: * Hive 0.11 ORC files * Hive 0.12 ORC files So I'd like to propose that we add a new ORC 2.0 file version and all of these changes need to be so tagged. The new ORC writers will maintain the ability to write the old versions of the files (Hive 0.11 ORC and Hive 0.12 ORC) as well as the ORC 2.0 files. The new reader will automatically read all three versions. Thoughts? Owen