Hi there - I'm working on a small Parquet project and encountering some surprising results with regard to encoding decisions.
My dataset consists of ~1.5MM log lines parsed to an Avro schema and written to a Parquet file via AvroParquetWriter. According to its log output, Parquet is writing all int/long columns out with either [BIT_PACKED, PLAIN] or [BIT_PACKED, PLAIN_DICTIONARY]. This surprised me - at least one of those columns is an epoch value that should be quite amenable to the DELTA_BINARY_PACKED. What's the best way to understand Parquet's encoding choices? Secondary question: Is DELTA_BINARY_PACKED supported for INT64 columns? The documentation[1] says it is, but the code[2] suggests otherwise. Cheers, -- b [1]: https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-encoding-delta_binary_packed--5 [2]: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L166-L168
