alamb commented on code in PR #9662:
URL: https://github.com/apache/arrow-rs/pull/9662#discussion_r3041447315
##########
parquet/src/util/bit_util.rs:
##########
@@ -46,6 +46,17 @@ pub unsafe trait FromBytes: Sized {
fn from_le_bytes(bs: Self::Buffer) -> Self;
}
+/// Types that can be decoded from bitpacked representations.
+///
+/// This is implemented for primitive types and bool that can be
+/// directly converted from a u64 value. Types like Int96, ByteArray,
+/// and FixedLenByteArray that cannot be represented in 64 bits do not
+/// implement this trait.
+pub trait FromBitpacked: FromBytes {
+ /// Convert directly from a u64 value by truncation, avoiding byte slice
copies.
+ fn from_u64(v: u64) -> Self;
Review Comment:
It seems to me that this differs from `FromBytes` because the width of the
source is know (always 64 bits) rather than being a slice.
https://github.com/apache/arrow-rs/blob/acdbbe05cf2d7546415bb4859556a8da29d562fe/parquet/src/util/bit_util.rs#L45-L44
##########
parquet/src/encodings/rle.rs:
##########
@@ -352,7 +352,7 @@ impl RleDecoder {
// that damage L1d-cache occupancy. This results in a ~18% performance drop
#[inline(never)]
#[allow(unused)]
Review Comment:
Drive by -- I wonder if this still is "unused" (maybe a follow on PR)
##########
parquet/src/util/bit_util.rs:
##########
@@ -46,6 +46,17 @@ pub unsafe trait FromBytes: Sized {
fn from_le_bytes(bs: Self::Buffer) -> Self;
}
+/// Types that can be decoded from bitpacked representations.
+///
+/// This is implemented for primitive types and bool that can be
+/// directly converted from a u64 value. Types like Int96, ByteArray,
+/// and FixedLenByteArray that cannot be represented in 64 bits do not
+/// implement this trait.
+pub trait FromBitpacked: FromBytes {
+ /// Convert directly from a u64 value by truncation, avoiding byte slice
copies.
+ fn from_u64(v: u64) -> Self;
Review Comment:
I wonder if we should also mark `FromBitpacked` as `unsafe` 🤔 to mirror
FromBytes
But that being said I feel like I don't really understand why `FromBytes` is
marked as unsafe to begin with
##########
parquet/src/encodings/rle.rs:
##########
@@ -507,10 +507,30 @@ impl RleDecoder {
self.bit_packed_left = 0;
break;
}
- buffer[values_read..values_read + num_values]
- .iter_mut()
- .zip(index_buf[..num_values].iter())
- .for_each(|(b, i)| b.clone_from(&dict[*i as usize]));
+ {
Review Comment:
Is this extra level of indent needed? If so, can we comment about what
purpose it is serving?
##########
parquet/src/util/bit_util.rs:
##########
@@ -79,6 +134,13 @@ unsafe impl FromBytes for bool {
}
}
+impl FromBitpacked for bool {
+ #[inline]
+ fn from_u64(v: u64) -> Self {
+ v != 0
Review Comment:
if we are really decoding boolean values one at a time from a bitpacked
source, I think we could probably do a lot better by just copying the bits
directly 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]