sfc-gh-nthimmegowda opened a new pull request, #14147:
URL: https://github.com/apache/arrow/pull/14147

   ARROW-17450
   
   Currently, parquet-cpp does not support columns encoded with RLE encoding. 
Although the users of RLE encoding are quite sparse with uses of one of the 3 
types [Repetition and definition levels, dictionary indices and boolean values 
in data pages], some implementations do encode this directly on boolean columns 
(Athena on AWS). Even though there is encoding and decoding support for 
repetition and definition levels, there is no support for boolean column with 
RLE encoding. 
   
   This PR integrates the column scanning to support columns with RLE encoding. 
The first 4 bytes of the data length are size of the encoded data, which is 
parsed first and then passes to decoder. 
   
   Added a test with rle boolean encoded parquet file to validate that values 
can be parsed individually and in batch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to