jhorstmann opened a new issue, #5338: URL: https://github.com/apache/arrow-rs/issues/5338
The documentation for the [parquet BIT_PACKED encoding](https://github.com/apache/parquet-format/blob/master/Encodings.md#bit-packed-deprecated-bit_packed--4) says: > For compatibility reasons, this implementation packs values from the most significant bit to the least significant bit, which is not the same as the [RLE/bit-packing](https://github.com/apache/parquet-format/blob/master/Encodings.md#RLE) hybrid. Followed by an example that is clearly different than the example for the RLE encoding. The documentation there also says > The bit-packing here is done in a different order than the one in the [deprecated bit-packing](https://github.com/apache/parquet-format/blob/master/Encodings.md#BITPACKED) encoding However, in the arrow-rs/parquet code base, I see both encodings use the same [`BitReader::get_batch` implementation](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/util/bit_util.rs#L438). For [bitpacked it is used directly](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/column/reader/decoder.rs#L276), while for rle indirectly via [`RleDecoder::get_batch`](https://github.com/apache/arrow-rs/blob/50.0.0/parquet/src/encodings/rle.rs#L397). I think parquet2 is doing similar reuse of the bitpacking logic. As far as I know, both rust parquet implementations pass the integration test suite, so there are multiple options to describe this discrepancy: - The documentation is wrong, maybe confusing bit order with little-/big-endian byte order - The rust code is wrong, but bitpacked encoding is not used in practice, not even in the test suite - The difference only shows in big-endian machines (I don't think this can be the case since the examples show bytes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
