wgtmac commented on PR #34323:
URL: https://github.com/apache/arrow/pull/34323#issuecomment-1448474977

   IMHO, `DeltaBitPackEncoder` has two possible optimizations.
   
   1. The mini-block size is fixed. This can be chosen adaptively based on data 
distribution. It can in turn affect the decoding time: 
https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc#L2105
   ```cpp
     static constexpr uint32_t kValuesPerBlock = 128;
     static constexpr uint32_t kMiniBlocksPerBlock = 4;
   ```
   
   2. It involves a procedure to compute deltas in the encoder and restore the 
values in the decoder. It can be vectorized to accelerate the encoding/decoding 
time.
   https://github.com/apache/arrow/blob/main/cpp/src/parquet/encoding.cc#L2526
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to