jhorstmann opened a new issue, #5338:
URL: https://github.com/apache/arrow-rs/issues/5338

   The documentation for the [parquet BIT_PACKED 
encoding](https://github.com/apache/parquet-format/blob/master/Encodings.md#bit-packed-deprecated-bit_packed--4)
 says:
   
   > For compatibility reasons, this implementation packs values from the most 
significant bit to the least significant bit, which is not the same as the 
[RLE/bit-packing](https://github.com/apache/parquet-format/blob/master/Encodings.md#RLE)
 hybrid.
   
   Followed by an example that is clearly different than the example for the 
RLE encoding. The documentation there also says
   
   > The bit-packing here is done in a different order than the one in the 
[deprecated 
bit-packing](https://github.com/apache/parquet-format/blob/master/Encodings.md#BITPACKED)
 encoding
   
   However, in the arrow-rs/parquet code base, I see both encodings use the 
same [`BitReader::get_batch` 
implementation](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/util/bit_util.rs#L438).
 For [bitpacked it is used 
directly](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/column/reader/decoder.rs#L276),
 while for rle indirectly via 
[`RleDecoder::get_batch`](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/encodings/rle.rs#L397).
 I think parquet2 is doing similar reuse of the bitpacking logic.
   
   As far as I know, both rust parquet implementations pass the integration 
test suite, so there are multiple options to describe this discrepancy:
   
    - The documentation is wrong, maybe confusing bit order with 
little-/big-endian byte order
    - The rust code is wrong, but bitpacked encoding is not used in practice, 
not even in the test suite
    - The difference only shows in big-endian machines (I don't think this can 
be the case since the examples show bytes)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to