alamb commented on code in PR #9662:
URL: https://github.com/apache/arrow-rs/pull/9662#discussion_r3041447315


##########
parquet/src/util/bit_util.rs:
##########
@@ -46,6 +46,17 @@ pub unsafe trait FromBytes: Sized {
     fn from_le_bytes(bs: Self::Buffer) -> Self;
 }
 
+/// Types that can be decoded from bitpacked representations.
+///
+/// This is implemented for primitive types and bool that can be
+/// directly converted from a u64 value. Types like Int96, ByteArray,
+/// and FixedLenByteArray that cannot be represented in 64 bits do not
+/// implement this trait.
+pub trait FromBitpacked: FromBytes {
+    /// Convert directly from a u64 value by truncation, avoiding byte slice 
copies.
+    fn from_u64(v: u64) -> Self;

Review Comment:
   It seems to me that this differs from `FromBytes` because the width of the 
source is know (always 64 bits) rather than being a slice.
   
   
https://github.com/apache/arrow-rs/blob/acdbbe05cf2d7546415bb4859556a8da29d562fe/parquet/src/util/bit_util.rs#L45-L44



##########
parquet/src/encodings/rle.rs:
##########
@@ -352,7 +352,7 @@ impl RleDecoder {
     // that damage L1d-cache occupancy. This results in a ~18% performance drop
     #[inline(never)]
     #[allow(unused)]

Review Comment:
   Drive by -- I wonder if this still is "unused"  (maybe a follow on PR)



##########
parquet/src/util/bit_util.rs:
##########
@@ -46,6 +46,17 @@ pub unsafe trait FromBytes: Sized {
     fn from_le_bytes(bs: Self::Buffer) -> Self;
 }
 
+/// Types that can be decoded from bitpacked representations.
+///
+/// This is implemented for primitive types and bool that can be
+/// directly converted from a u64 value. Types like Int96, ByteArray,
+/// and FixedLenByteArray that cannot be represented in 64 bits do not
+/// implement this trait.
+pub trait FromBitpacked: FromBytes {
+    /// Convert directly from a u64 value by truncation, avoiding byte slice 
copies.
+    fn from_u64(v: u64) -> Self;

Review Comment:
   I wonder if we should also mark `FromBitpacked` as `unsafe` 🤔 to mirror 
FromBytes
   
   But that being said I feel like I don't really understand why `FromBytes` is 
marked as unsafe to begin with



##########
parquet/src/encodings/rle.rs:
##########
@@ -507,10 +507,30 @@ impl RleDecoder {
                         self.bit_packed_left = 0;
                         break;
                     }
-                    buffer[values_read..values_read + num_values]
-                        .iter_mut()
-                        .zip(index_buf[..num_values].iter())
-                        .for_each(|(b, i)| b.clone_from(&dict[*i as usize]));
+                    {

Review Comment:
   Is this extra level of indent needed? If so, can we comment about what 
purpose it is serving?



##########
parquet/src/util/bit_util.rs:
##########
@@ -79,6 +134,13 @@ unsafe impl FromBytes for bool {
     }
 }
 
+impl FromBitpacked for bool {
+    #[inline]
+    fn from_u64(v: u64) -> Self {
+        v != 0

Review Comment:
   if we are really decoding boolean values one at a time from a bitpacked 
source, I think we could probably do a lot better by just copying the bits 
directly 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to