Sorry, to clarify, in this question:

1) Was RLE (the Hybrid-bitpacked RLE encoder used for
repetition/definition levels) ever intended for use for encoding data
pages in the Parquet V1 format?

I meant for encoding data pages that do not contain dictionary indices
(i.e. as an alternative to PLAIN or PLAIN_DICTIONARY/RLE_DICTIONARY)

On Wed, Dec 6, 2017 at 4:53 PM, Wes McKinney <[email protected]> wrote:
> We had a discussion recently [1] in which a Python implementation of
> Parquet had used the RLE encoding type for encoding the data pages for
> INT32 values with UINT_8 logical type (non dictionary-encoded).
>
> In the Encodings.md document [3] in the Parquet format, it is not
> strictly indicated that the RLE encoding is to be used for
> definition/repetition levels and boolean, though that is all that is
> supported in parquet-mr [4], parquet-cpp, Impala [5], and other
> implementations.
>
> So questions:
>
> 1) Was RLE (the Hybrid-bitpacked RLE encoder used for
> repetition/definition levels) ever intended for use for encoding data
> pages in the Parquet V1 format?
>
> 2) Whether yes or no, should we update apache/parquet-format to be
> more explicit about the purpose and scope of this encoding?
>
> Thanks,
> Wes
>
> [1]: https://github.com/dask/fastparquet/issues/256
> [2]: https://github.com/dask/fastparquet
> [3]: https://github.com/apache/parquet-format/blob/master/Encodings.md
> [4]: 
> https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/column/Encoding.java#L115
> [5]: 
> https://github.com/apache/impala/blob/master/be/src/exec/parquet-column-readers.cc#L495

Reply via email to