xuzifu666 commented on issue #9670:
URL: https://github.com/apache/arrow-rs/issues/9670#issuecomment-4258090019
I'm very interested in this issue and tried it out according to your
suggestions. It did show at least a 10% improvement in benchmark tests. Below
is a comparison of the results before and after the benchmark tests(include my
code modified way):
```
# Parquet Dictionary Decoding Performance Optimization Benchmark Results
## Test Environment
- **Date**: 2026-04-15
- **Test Command**: `cargo bench -p parquet --features experimental --bench
dict_gather_compare`
- **Optimizations**:
1. Dictionary Gather/Scatter Loop Unrolling Optimization
(`parquet/src/encodings/rle.rs`)
2. BitReader Code Generation Optimization (`parquet/src/util/bit_util.rs`)
## Performance Comparison Results
| Test Scenario | Original Version | Optimized Version | Performance
Improvement |
|---------|---------|---------|---------|
| dict16_vals65536 | 22.85 µs | 20.32 µs | **~11.1%** |
| dict256_vals65536 | 23.53 µs | 21.35 µs | **~9.3%** |
| dict4096_vals65536 | 27.24 µs | 24.52 µs | **~10.0%** |
| dict256_vals1048576_large | 369.3 µs | 337.3 µs | **~8.7%** |
### Detailed Data
#### 1. dict16_vals65536 (Dictionary Size: 16, Value Count: 65536)
```
Original Version: time: [22.504 µs 22.855 µs 23.353 µs]
Optimized Version: time: [20.236 µs 20.317 µs 20.406 µs]
Improvement: ~11.1%
```
#### 2. dict256_vals65536 (Dictionary Size: 256, Value Count: 65536)
```
Original Version: time: [23.251 µs 23.526 µs 23.987 µs]
Optimized Version: time: [21.124 µs 21.354 µs 21.645 µs]
Improvement: ~9.3%
```
#### 3. dict4096_vals65536 (Dictionary Size: 4096, Value Count: 65536)
```
Original Version: time: [27.004 µs 27.236 µs 27.513 µs]
Optimized Version: time: [24.426 µs 24.519 µs 24.638 µs]
Improvement: ~10.0%
```
#### 4. dict256_vals1048576_large (Dictionary Size: 256, Value Count:
1,048,576)
```
Original Version: time: [368.21 µs 369.30 µs 370.68 µs]
Optimized Version: time: [335.43 µs 337.28 µs 339.46 µs]
Improvement: ~8.7%
```
## Optimization Details
### 1. Dictionary Gather/Scatter Loop Optimization
**File**: `parquet/src/encodings/rle.rs`
**Changes**:
- Increased loop unrolling from 8-element batches to 16-element batches
- Moved bounds checking from inside the loop to outside (using
`debug_assert!`)
- Used `get_unchecked` to avoid repeated bounds checking
- Separated into three levels: 16-element, 8-element, and remainder
processing
**Code Example**:
```rust
// Before Optimization
for (out_chunk, idx_chunk) in out_chunks.by_ref().zip(idx_chunks) {
let dict_len = dict.len();
assert!(idx_chunk.iter().all(|&i| (i as usize) < dict_len));
for (b, i) in out_chunk.iter_mut().zip(idx_chunk.iter()) {
b.clone_from(unsafe { dict.get_unchecked(*i as usize) });
}
}
// After Optimization
debug_assert!(idx.iter().all(|&i| (i as usize) < dict_len));
for (out_chunk, idx_chunk) in out_chunks.by_ref().zip(idx_chunks.by_ref()) {
unsafe {
let i0 = *idx_chunk.get_unchecked(0) as usize;
// ... Unroll 16 elements
out_chunk.get_unchecked_mut(0).clone_from(dict.get_unchecked(i0));
// ... Unroll 16 assignments
}
}
```
### 2. BitReader Code Generation Optimization
**File**: `parquet/src/util/bit_util.rs`
**Changes**:
- Used `T::from_u64()` to directly construct values, avoiding buffer
allocation and slice copying
- Reduced temporary variable creation
**Code Example**:
```rust
// Before Optimization
for out in out_buf {
let mut out_bytes = T::Buffer::default();
out_bytes.as_mut()[..4].copy_from_slice(&out.to_le_bytes());
batch[i] = T::from_le_bytes(out_bytes);
i += 1;
}
// After Optimization
for out in out_buf {
batch[i] = T::from_u64(out as u64);
i += 1;
}
```
## Conclusion
The two optimizations combined brought **8-12%** performance improvement,
which is a significant enhancement on the hot path of dictionary decoding. The
optimization effects remain stable across different dictionary sizes and data
scales.
---
*Generated by benchmark test on 2026-04-15*
```
If there are no problems, I can submit relevant PR later. @Dandandan
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]