scovich commented on code in PR #9689:
URL: https://github.com/apache/arrow-rs/pull/9689#discussion_r3073335039
##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -246,6 +254,18 @@ where
precision,
scale,
),
+ Variant::Float(f) => single_float_to_decimal::<O>(f64::from(*f), mul),
+ Variant::Double(f) => single_float_to_decimal::<O>(*f, mul),
+ Variant::String(v) if scale > 0 => parse_string_to_decimal_native::<O>(
+ v,
+ <i8 as TryInto<usize>>::try_into(scale).expect("scale is positive,
would never fail"),
Review Comment:
`as _` seems appropriate for cases like this?
(sorry if my earlier comments gave the impression that it was outright
bad/forbidden)
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -89,7 +89,7 @@ where
D: DecimalType,
F: Fn(D::Native) -> f64,
{
- f(x) / 10_f64.powi(scale)
+ f(x) * 10_f64.powi(-scale)
Review Comment:
Rescuing https://github.com/apache/arrow-rs/pull/9689#discussion_r3065529446
from github oblivion...
I just remembered that floating point `a * 10**-b` is technically _NOT_
equivalent to `a / 10**b`. The [algebraic
operators](https://doc.rust-lang.org/std/primitive.f32.html#algebraic-operators)
section of rust docs talks about it:
> Algebraic operators of the form `a.algebraic_*(b)` allow the compiler to
_**optimize floating point operations using all the usual algebraic properties
of real numbers – despite the fact that those properties do not hold on
floating point numbers**_. This can give a great performance boost since it may
unlock vectorization.
>
> The exact set of optimizations is unspecified but typically allows
combining operations, rearranging series of operations based on mathematical
properties, **_converting between division and reciprocal multiplication_**,
and disregarding the sign of zero.
(emphasis mine)
Whether we think any difference matters for this specific case... I don't
know. But we should probably defer a change like this to its own PR with
appropriate performance evaluation and weighing of trade-offs.
At least there's now a narrow waist for such a future optimization to be
made easily.
##########
parquet-variant/src/variant.rs:
##########
@@ -294,6 +302,39 @@ pub enum Variant<'m, 'v> {
// We don't want this to grow because it could hurt performance of a
frequently-created type.
const _: () = crate::utils::expect_size_of::<Variant>(80);
+enum NumericKind {
+ Integer,
+ Float,
+}
+
+trait DecimalCastTarget: NumCast + Default {
+ const KIND: NumericKind;
+ fn arrow_type() -> DataType;
+}
+
+macro_rules! impl_decimal_cast_target {
+ ($raw_type: ident, $target_kind:expr, $arrow_type: expr) => {
+ impl DecimalCastTarget for $raw_type {
+ const KIND: NumericKind = $target_kind;
+ fn arrow_type() -> DataType {
+ $arrow_type
+ }
Review Comment:
Out of curiosity, why a function for this one, instead of a `const` like the
other uses?
```suggestion
const ARROW_TYPE: DataType = $arrow_type;
```
##########
parquet-variant/src/variant.rs:
##########
@@ -1199,14 +1320,27 @@ impl<'m, 'v> Variant<'m, 'v> {
/// let v1 = Variant::from(VariantDecimal4::try_new(1234_i32, 2).unwrap());
/// assert_eq!(v1.as_decimal16(), VariantDecimal16::try_new(1234_i128,
2).ok());
///
+ /// // or from a string variant if it can be parsed as decimal
+ /// let v2 = Variant::from("123.45");
+ /// assert_eq!(v2.as_decimal16(), VariantDecimal16::try_new(12345,
2).ok());
+ ///
/// // but not if the variant is not a decimal
- /// let v2 = Variant::from("hello!");
- /// assert_eq!(v2.as_decimal16(), None);
+ /// let v3 = Variant::from("hello!");
+ /// assert_eq!(v3.as_decimal16(), None);
/// ```
pub fn as_decimal16(&self) -> Option<VariantDecimal16> {
match *self {
- Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) |
Variant::Int64(_) => {
- self.as_num::<i128>().and_then(|x| x.try_into().ok())
+ Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) |
Variant::Int64(_) => self
+ .as_num::<i64>()
+ .map(|x| <i128 as From<i64>>::from(x).try_into().ok())
+ .unwrap(),
Review Comment:
Why `unwrap` an option in a function that returns `Option?
```suggestion
Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) |
Variant::Int64(_) => {
let x = self.as_num::<i64>()?;
i128::from(x).try_into().ok()
}
```
Also: the `<i128 as From<i64>>` is surprising -- does the compiler actually
need it for some reason?
##########
parquet-variant/src/variant.rs:
##########
@@ -815,9 +883,21 @@ impl<'m, 'v> Variant<'m, 'v> {
Variant::Int64(i) => num_cast(i),
Variant::Float(f) => num_cast(f),
Variant::Double(d) => num_cast(d),
- Variant::Decimal4(d) if d.scale() == 0 => num_cast(d.integer()),
- Variant::Decimal8(d) if d.scale() == 0 => num_cast(d.integer()),
- Variant::Decimal16(d) if d.scale() == 0 => num_cast(d.integer()),
+ Variant::Decimal4(d) => {
+ Self::cast_decimal_to_num::<Decimal32Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
+ Variant::Decimal8(d) => {
+ Self::cast_decimal_to_num::<Decimal64Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
+ Variant::Decimal16(d) => {
+ Self::cast_decimal_to_num::<Decimal128Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
Review Comment:
aside: Very odd choice by `fmt`... I would have expected
```suggestion
Variant::Decimal16(d) =>
Self::cast_decimal_to_num::<Decimal128Type, T, _>(
d.integer(),
d.scale(),
|x| x as f64,
)
```
🤷
##########
parquet-variant/src/variant.rs:
##########
@@ -1106,6 +1186,23 @@ impl<'m, 'v> Variant<'m, 'v> {
self.as_num()
}
+ fn convert_string_to_decimal<D, VD>(input: &str) -> Option<VD>
+ where
+ D: DecimalType,
+ VD: VariantDecimalType<Native = D::Native>,
+ D::Native: NumCast + DecimalCast,
+ {
+ // find the last '.'
+ let scale_usize = input
+ .rsplit_once('.')
+ .map_or_else(|| 0, |(_, frac)| frac.len());
Review Comment:
```suggestion
.map_or(0, |(_, frac)| frac.len());
```
(sorry, I mixed up the two names in my earlier suggestion)
I'm actually surprised clippy didn't flag it.
##########
parquet-variant/src/variant.rs:
##########
@@ -797,14 +838,41 @@ impl<'m, 'v> Variant<'m, 'v> {
}
}
- /// Converts a boolean or numeric variant(integers, floating-point, and
decimals with scale 0)
+ fn cast_decimal_to_num<D, T, F>(raw: D::Native, scale: u8, as_float: F) ->
Option<T>
+ where
+ D: DecimalType,
+ D::Native: NumCast + ArrowNativeTypeOp,
+ T: DecimalCastTarget,
+ F: Fn(D::Native) -> f64,
+ {
+ let base: D::Native = NumCast::from(10)?;
+
+ base.pow_checked(<u32 as From<u8>>::from(scale))
+ .ok()
+ .and_then(|div| match T::KIND {
Review Comment:
Another place where `and_then` hurts readability
```rust
let div = base.pow_checked(<u32 as From<u8>>::from(scale)).ok()?;
match T::KIND {
...
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]