Copilot commented on code in PR #9662:
URL: https://github.com/apache/arrow-rs/pull/9662#discussion_r3035377116
##########
parquet/src/util/bit_util.rs:
##########
@@ -59,12 +61,48 @@ macro_rules! from_le_bytes {
fn from_le_bytes(bs: Self::Buffer) -> Self {
<$ty>::from_le_bytes(bs)
}
+ #[inline]
+ fn from_u64(v: u64) -> Self {
+ v as Self
+ }
}
)*
};
}
-from_le_bytes! { u8, u16, u32, u64, i8, i16, i32, i64, f32, f64 }
+from_le_bytes! { u8, u16, u32, u64, i8, i16, i32, i64 }
+
+// SAFETY: all bit patterns are valid for f32 and f64.
+unsafe impl FromBytes for f32 {
+ const BIT_CAPACITY: usize = 32;
+ type Buffer = [u8; 4];
+ fn try_from_le_slice(b: &[u8]) -> Result<Self> {
+ Ok(Self::from_le_bytes(array_from_slice(b)?))
+ }
+ fn from_le_bytes(bs: Self::Buffer) -> Self {
+ f32::from_le_bytes(bs)
+ }
+ #[inline]
+ fn from_u64(v: u64) -> Self {
+ f32::from_bits(v as u32)
+ }
+}
+
+// SAFETY: all bit patterns are valid for f64.
+unsafe impl FromBytes for f64 {
+ const BIT_CAPACITY: usize = 64;
+ type Buffer = [u8; 8];
+ fn try_from_le_slice(b: &[u8]) -> Result<Self> {
+ Ok(Self::from_le_bytes(array_from_slice(b)?))
+ }
+ fn from_le_bytes(bs: Self::Buffer) -> Self {
+ f64::from_le_bytes(bs)
+ }
+ #[inline]
+ fn from_u64(v: u64) -> Self {
+ f64::from_bits(v)
+ }
+}
Review Comment:
New float-specific `FromBytes` implementations now rely on `from_u64` using
`from_bits`, but there are no unit tests in this module exercising
`BitReader::get_value::<f32/f64>` (including NaN payload / sign-bit cases).
Adding targeted tests would help ensure the bit-preserving behavior remains
correct across refactors.
##########
parquet/src/util/bit_util.rs:
##########
@@ -445,8 +496,7 @@ impl BitReader {
}
}
- // TODO: better to avoid copying here
- T::try_from_le_slice(v.as_bytes()).ok()
+ Some(T::from_u64(v))
Review Comment:
`BitReader::get_value` now unconditionally returns `Some(T::from_u64(v))`,
which removes the previous fallible conversion behavior. This can introduce new
panics for `T` implementations where `from_u64` is not meaningful (e.g.,
`ByteArray`/`FixedLenByteArray`/`Int96` currently `unreachable!()`), and also
changes behavior for any type that previously could fail `try_from_le_slice`.
Consider either (a) keeping a fallible path (e.g., fallback to
`T::try_from_le_slice(&v.to_le_bytes()).ok()`) for unsupported types, or (b)
explicitly restricting `get_value` to types representable by `u64` (e.g.,
`size_of::<T>() <= 8` and/or `T::BIT_CAPACITY != 0`) with a clearer contract.
```suggestion
T::try_from_le_slice(&v.to_le_bytes()).ok()
```
##########
parquet/src/util/bit_util.rs:
##########
@@ -130,6 +178,9 @@ unsafe impl FromBytes for FixedLenByteArray {
fn from_le_bytes(bs: Self::Buffer) -> Self {
bs.into()
}
+ fn from_u64(_v: u64) -> Self {
+ unreachable!("FixedLenByteArray does not support from_u64")
+ }
Review Comment:
`from_u64` for `Int96`, `ByteArray`, and `FixedLenByteArray` is implemented
with `unreachable!()`, but these types already have working
`try_from_le_slice`/`from_le_bytes` implementations. Because `from_u64` is part
of a public trait, this turns previously valid calls into runtime panics.
Prefer either implementing `from_u64` in a way that preserves the prior
semantics (e.g., constructing from `v.to_le_bytes()` for byte-array types), or
redesigning so `get_value` can’t call `from_u64` for these types.
##########
parquet/src/util/bit_util.rs:
##########
@@ -44,6 +44,8 @@ pub unsafe trait FromBytes: Sized {
type Buffer: AsMut<[u8]> + Default;
fn try_from_le_slice(b: &[u8]) -> Result<Self>;
fn from_le_bytes(bs: Self::Buffer) -> Self;
+ /// Convert directly from a u64 value by truncation, avoiding byte slice
copies.
+ fn from_u64(v: u64) -> Self;
Review Comment:
`FromBytes` is a `pub` trait, and adding a new required method (`from_u64`)
is a semver-breaking API change for downstream implementors. Consider providing
a default implementation of `from_u64` (e.g., based on
`try_from_le_slice(&v.to_le_bytes())`) so external impls keep compiling, while
allowing performance overrides for primitives.
```suggestion
fn from_u64(v: u64) -> Self {
Self::try_from_le_slice(&v.to_le_bytes())
.expect("u64 should contain enough bytes to construct value")
}
```
##########
parquet/src/util/bit_util.rs:
##########
@@ -44,6 +44,8 @@ pub unsafe trait FromBytes: Sized {
type Buffer: AsMut<[u8]> + Default;
fn try_from_le_slice(b: &[u8]) -> Result<Self>;
fn from_le_bytes(bs: Self::Buffer) -> Self;
+ /// Convert directly from a u64 value by truncation, avoiding byte slice
copies.
Review Comment:
The doc comment for `FromBytes::from_u64` says “by truncation”, but some
implementations intentionally preserve bit patterns (`f32`/`f64` via
`from_bits`) or interpret as boolean (`v != 0`). Consider updating the comment
to describe the general contract (conversion from the bit-buffer value
extracted by `BitReader`) instead of truncation-specific wording.
```suggestion
/// Convert directly from the `u64` bit-buffer value extracted by
`BitReader`,
/// avoiding byte slice copies. The exact interpretation of `v` is
type-specific.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]