etseidl commented on code in PR #9477:
URL: https://github.com/apache/arrow-rs/pull/9477#discussion_r2873439238
##########
parquet/src/encodings/decoding.rs:
##########
@@ -770,15 +770,44 @@ where
// At this point we have read the deltas to `buffer` we now need
to offset
// these to get back to the original values that were encoded
- for v in &mut buffer[read..read + batch_read] {
- // It is OK for deltas to contain "overflowed" values after
encoding,
- // e.g. i64::MAX - i64::MIN, so we use `wrapping_add` to
"overflow" again and
- // restore original value.
- *v = v
- .wrapping_add(&self.min_delta)
- .wrapping_add(&self.last_value);
-
- self.last_value = *v;
+ //
+ // Optimization: if the bit_width for the miniblock is 0, then we
can employ
+ // a faster decoding method than setting `value[i] = value[i-1] +
value[i] + min_delta`.
+ // Where min_delta is 0 (all values in the miniblock are the
same), we can simply
+ // set all values to `self.last_value`. In the case of non-zero
min_delta (values
+ // in the mini-block form an arithmetic progression) each value
can be computed via
+ // `value[i] = (i + 1) * min_delta + last_value`. In both cases we
remove the
+ // dependence on the preceding value.
+ // Kudos to @pitrou for the idea
https://github.com/apache/arrow/pull/49296
+ if bit_width == 0 {
+ let min_delta = self.min_delta.as_i64()?;
+ if min_delta == 0 {
+ for v in &mut buffer[read..read + batch_read] {
Review Comment:
I had done that, but on my laptop it was actually slower for some
reason...I'll check again on a more modern processor. I do agree it would look
nicer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]