wgtmac commented on PR #34323: URL: https://github.com/apache/arrow/pull/34323#issuecomment-1448474977
IMHO, `DeltaBitPackEncoder` has two possible optimizations. 1. The mini-block size is fixed. This can be chosen adaptively based on data distribution. It can in turn affect the decoding time: https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc#L2105 ```cpp static constexpr uint32_t kValuesPerBlock = 128; static constexpr uint32_t kMiniBlocksPerBlock = 4; ``` 2. It involves a procedure to compute deltas in the encoder and restore the values in the decoder. It can be vectorized to accelerate the encoding/decoding time. https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc#L2526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
