Have we enabled the 2.0 encodings? On Wed, Dec 30, 2015 at 5:34 PM, Benjamin Anderson <[email protected]> wrote:
> Hi there - I'm working on a small Parquet project and encountering > some surprising results with regard to encoding decisions. > > My dataset consists of ~1.5MM log lines parsed to an Avro schema and > written to a Parquet file via AvroParquetWriter. According to its log > output, Parquet is writing all int/long columns out with either > [BIT_PACKED, PLAIN] or [BIT_PACKED, PLAIN_DICTIONARY]. This surprised > me - at least one of those columns is an epoch value that should be > quite amenable to the DELTA_BINARY_PACKED. What's the best way to > understand Parquet's encoding choices? > > Secondary question: Is DELTA_BINARY_PACKED supported for INT64 > columns? The documentation[1] says it is, but the code[2] suggests > otherwise. > > Cheers, > -- > b > > [1]: > https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5 > [2]: > https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L166-L168 >
