jhorstmann opened a new issue, #7739:
URL: https://github.com/apache/arrow-rs/issues/7739
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
While looking into parquet loading performance, I noticed that the
definition levels for simple nullable columns often contain very short rle
runs. These seem to mess with branch prediction when decoding and decrease
performance.
I managed to reproduce this in a benchmark by creating a custom RleEncoder
for a bitwidth of 1, which only writes bitpacked runs. With 90% of bits being
set, these are the results:
```
rle_decoder/decode_bitpacked
time: [121.83 µs 121.98 µs 122.13 µs]
thrpt: [8.5857 Gelem/s 8.5966 Gelem/s 8.6068
Gelem/s]
rle_decoder/decode_hybrid
time: [661.33 µs 662.31 µs 663.56 µs]
thrpt: [1.5802 Gelem/s 1.5832 Gelem/s 1.5856
Gelem/s]
```
Note the **decoder** in this benchmark is the same for both cases, the only
difference is how the levels are serialized.
I also hacked some statistics into the decoder. For the above benchmark
these are the distributions of rle vs bitpacked encoded runs. On average, the
rle runs are only 16 elements long, and the bitpacked only encoding even seems
to be shorter:
```
[parquet/benches/rle.rs:292:9] bitpacked_data.len() = 135169
[parquet/benches/rle.rs:292:9] &bitpacked_stats = HybridRleStats {
num_values: 1048576,
num_rle_values: 0,
num_rle_runs: 0,
num_bitpacked_values: 1048576,
num_bitpacked_runs: 2048,
}
[parquet/benches/rle.rs:296:9] hybrid_data.len() = 157050
[parquet/benches/rle.rs:296:9] &hybrid_stats = HybridRleStats {
num_values: 1048576,
num_rle_values: 501915,
num_rle_runs: 29541,
num_bitpacked_values: 546661,
num_bitpacked_runs: 29541,
}
```
**Describe the solution you'd like**
I'd like the RleEncoder to be smarter about switching between bitpacked and
rle runs. Maybe this is a one-line change in
[`encodings/rle.rs`](https://github.com/apache/arrow-rs/blob/df702cfc714dd25060f6a493570221df18b2c598/parquet/src/encodings/rle.rs#L258),
but I find the logic there a bit hard to follow. Alternatives could be a
specialized implementation for single-bit levels, or making this configurable
for the user.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]