sfc-gh-nthimmegowda opened a new pull request, #14147: URL: https://github.com/apache/arrow/pull/14147
ARROW-17450 Currently, parquet-cpp does not support columns encoded with RLE encoding. Although the users of RLE encoding are quite sparse with uses of one of the 3 types [Repetition and definition levels, dictionary indices and boolean values in data pages], some implementations do encode this directly on boolean columns (Athena on AWS). Even though there is encoding and decoding support for repetition and definition levels, there is no support for boolean column with RLE encoding. This PR integrates the column scanning to support columns with RLE encoding. The first 4 bytes of the data length are size of the encoded data, which is parsed first and then passes to decoder. Added a test with rle boolean encoded parquet file to validate that values can be parsed individually and in batch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org