alamb commented on code in PR #2381:
URL: https://github.com/apache/arrow-rs/pull/2381#discussion_r943395500
##########
parquet/src/util/bit_util.rs:
##########
@@ -373,10 +364,15 @@ impl BitReader {
assert!(num_bits <= 64);
assert!(num_bits <= size_of::<T>() * 8);
- if self.byte_offset * 8 + self.bit_offset + num_bits >
self.total_bytes * 8 {
+ if self.byte_offset * 8 + self.bit_offset + num_bits >
self.buffer.len() * 8 {
return None;
}
+ // Only need to populate buffer if byte aligned
Review Comment:
I don't understand this comment -- maybe some more context would help about
why the buffer needs to be loaded if there are no more bits left in the
currrent byte
##########
parquet/src/util/bit_util.rs:
##########
@@ -429,7 +406,7 @@ impl BitReader {
let mut values_to_read = batch.len();
let needed_bits = num_bits * values_to_read;
- let remaining_bits = (self.total_bytes - self.byte_offset) * 8 -
self.bit_offset;
+ let remaining_bits = (self.buffer.len() - self.byte_offset) * 8 -
self.bit_offset;
Review Comment:
I don't think this PR is any better or worse, but what ensures that
self.bit_offset (which can be up to 63) is always less than `self.buffer.len()
- self.byte_offset`?
##########
parquet/src/util/bit_util.rs:
##########
@@ -385,37 +381,18 @@ impl BitReader {
self.byte_offset += 8;
self.bit_offset -= 64;
- self.reload_buffer_values();
- v |= trailing_bits(self.buffered_values, self.bit_offset)
- .wrapping_shl((num_bits - self.bit_offset) as u32);
+ if self.bit_offset != 0 {
Review Comment:
The logic that decides when to reload `self.buffered_values` is confusing to
me as I would normally expect calling `self.load_buffer()` would also reset
`bit_offset` to 0 to match having just reloaded more bits into
`self.buffered_values`
##########
parquet/src/util/bit_util.rs:
##########
@@ -672,8 +632,10 @@ impl BitReader {
})
}
- fn reload_buffer_values(&mut self) {
- let bytes_to_read = cmp::min(self.total_bytes - self.byte_offset, 8);
+ /// Populates `self.buffered_values`
+ #[inline]
+ fn load_buffer(&mut self) {
Review Comment:
Given there is a field called `buffer` and this function is not loading it,
but rather reading *from* it and loading into `buffered_values` I recommend
calling this function something slightly verbose:
```suggestion
/// Loads up to the the next 8 bytes from `self.buffer` at
`self.byte_offfset`
/// into `self.buffered_values`. Reads fewer than 8 bytes if there are
fewer than 8 bytes left
fn load_buffered_values(&mut self) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]