We had a discussion recently [1] in which a Python implementation of Parquet had used the RLE encoding type for encoding the data pages for INT32 values with UINT_8 logical type (non dictionary-encoded).
In the Encodings.md document [3] in the Parquet format, it is not strictly indicated that the RLE encoding is to be used for definition/repetition levels and boolean, though that is all that is supported in parquet-mr [4], parquet-cpp, Impala [5], and other implementations. So questions: 1) Was RLE (the Hybrid-bitpacked RLE encoder used for repetition/definition levels) ever intended for use for encoding data pages in the Parquet V1 format? 2) Whether yes or no, should we update apache/parquet-format to be more explicit about the purpose and scope of this encoding? Thanks, Wes [1]: https://github.com/dask/fastparquet/issues/256 [2]: https://github.com/dask/fastparquet [3]: https://github.com/apache/parquet-format/blob/master/Encodings.md [4]: https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L115 [5]: https://github.com/apache/impala/blob/master/be/src/exec/parquet-column-readers.cc#L495
