sfc-gh-nthimmegowda commented on code in PR #14147: URL: https://github.com/apache/arrow/pull/14147#discussion_r974958686
########## cpp/src/parquet/encoding.cc: ########## @@ -2355,6 +2355,80 @@ class DeltaLengthByteArrayDecoder : public DecoderImpl, std::shared_ptr<ResizableBuffer> buffered_data_; }; +// ---------------------------------------------------------------------- +// RLE_BOOLEAN_DECODER + +class RleBooleanDecoder : public DecoderImpl, virtual public BooleanDecoder { + public: + explicit RleBooleanDecoder(const ColumnDescriptor* descr) + : DecoderImpl(descr, Encoding::RLE) {} + + void SetData(int num_values, const uint8_t* data, int len) override { + num_values_ = num_values; + uint32_t num_bytes = 0; + + if (len < 4) { + throw ParquetException("Received invalid length : " + std::to_string(len) + + " (corrupt data page?)"); + } + // Load the first 4 bytes in little-endian, which indicates the length + num_bytes = + ::arrow::bit_util::ToLittleEndian(::arrow::util::SafeLoadAs<uint32_t>(data)); + if (num_bytes < 0 || num_bytes > (uint32_t)(len - 4)) { + throw ParquetException("Received invalid number of bytes : " + + std::to_string(num_bytes) + " (corrupt data page?)"); + } + + const uint8_t* decoder_data = data + 4; + decoder_ = std::make_shared<::arrow::util::RleDecoder>(decoder_data, num_bytes, + /*bit_width=*/1); + } + + int Decode(bool* buffer, int max_values) override { + max_values = std::min(max_values, num_values_); + int val = 0; Review Comment: Parquet uses `0` and `1` for boolean in Parquet (Apart from nulls). Size of boolean value in Parquet is 1 bit (Logically). Boolean is only supported in 2 encoding - [Plain](https://parquet.apache.org/docs/file-format/data-pages/encodings/#a-nameplainaplain-plain--0) and [RLE](https://parquet.apache.org/docs/file-format/data-pages/encodings/#a-namerlearun-length-encoding--bit-packing-hybrid-rle--3). Both of them use some part of bit-packing. Although in RLE cases, we might have lesser size because of continuous number, but physically never greater than 1 bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org