scovich commented on code in PR #9689:
URL: https://github.com/apache/arrow-rs/pull/9689#discussion_r3073335039


##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -246,6 +254,18 @@ where
             precision,
             scale,
         ),
+        Variant::Float(f) => single_float_to_decimal::<O>(f64::from(*f), mul),
+        Variant::Double(f) => single_float_to_decimal::<O>(*f, mul),
+        Variant::String(v) if scale > 0 => parse_string_to_decimal_native::<O>(
+            v,
+            <i8 as TryInto<usize>>::try_into(scale).expect("scale is positive, 
would never fail"),

Review Comment:
   `as _` seems appropriate for cases like this?
   
   (sorry if my earlier comments gave the impression that it was outright 
bad/forbidden)



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -89,7 +89,7 @@ where
     D: DecimalType,
     F: Fn(D::Native) -> f64,
 {
-    f(x) / 10_f64.powi(scale)
+    f(x) * 10_f64.powi(-scale)

Review Comment:
   Rescuing https://github.com/apache/arrow-rs/pull/9689#discussion_r3065529446 
from github oblivion...
   
   I just remembered that floating point `a * 10**-b` is technically _NOT_ 
equivalent to `a / 10**b`. The [algebraic 
operators](https://doc.rust-lang.org/std/primitive.f32.html#algebraic-operators)
 section of rust docs talks about it:
   > Algebraic operators of the form `a.algebraic_*(b)` allow the compiler to 
_**optimize floating point operations using all the usual algebraic properties 
of real numbers – despite the fact that those properties do not hold on 
floating point numbers**_. This can give a great performance boost since it may 
unlock vectorization.
   >
   > The exact set of optimizations is unspecified but typically allows 
combining operations, rearranging series of operations based on mathematical 
properties, **_converting between division and reciprocal multiplication_**, 
and disregarding the sign of zero. 
   
   (emphasis mine)
   
   Whether we think any difference matters for this specific case... I don't 
know. But we should probably defer a change like this to its own PR with 
appropriate performance evaluation and weighing of trade-offs.
   
   At least there's now a narrow waist for such a future optimization to be 
made easily.



##########
parquet-variant/src/variant.rs:
##########
@@ -294,6 +302,39 @@ pub enum Variant<'m, 'v> {
 // We don't want this to grow because it could hurt performance of a 
frequently-created type.
 const _: () = crate::utils::expect_size_of::<Variant>(80);
 
+enum NumericKind {
+    Integer,
+    Float,
+}
+
+trait DecimalCastTarget: NumCast + Default {
+    const KIND: NumericKind;
+    fn arrow_type() -> DataType;
+}
+
+macro_rules! impl_decimal_cast_target {
+    ($raw_type: ident, $target_kind:expr, $arrow_type: expr) => {
+        impl DecimalCastTarget for $raw_type {
+            const KIND: NumericKind = $target_kind;
+            fn arrow_type() -> DataType {
+                $arrow_type
+            }

Review Comment:
   Out of curiosity, why a function for this one, instead of a `const` like the 
other uses?
   ```suggestion
               const ARROW_TYPE: DataType = $arrow_type;
   ```



##########
parquet-variant/src/variant.rs:
##########
@@ -1199,14 +1320,27 @@ impl<'m, 'v> Variant<'m, 'v> {
     /// let v1 = Variant::from(VariantDecimal4::try_new(1234_i32, 2).unwrap());
     /// assert_eq!(v1.as_decimal16(), VariantDecimal16::try_new(1234_i128, 
2).ok());
     ///
+    /// // or from a string variant if it can be parsed as decimal
+    /// let v2 = Variant::from("123.45");
+    /// assert_eq!(v2.as_decimal16(), VariantDecimal16::try_new(12345, 
2).ok());
+    ///
     /// // but not if the variant is not a decimal
-    /// let v2 = Variant::from("hello!");
-    /// assert_eq!(v2.as_decimal16(), None);
+    /// let v3 = Variant::from("hello!");
+    /// assert_eq!(v3.as_decimal16(), None);
     /// ```
     pub fn as_decimal16(&self) -> Option<VariantDecimal16> {
         match *self {
-            Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) | 
Variant::Int64(_) => {
-                self.as_num::<i128>().and_then(|x| x.try_into().ok())
+            Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) | 
Variant::Int64(_) => self
+                .as_num::<i64>()
+                .map(|x| <i128 as From<i64>>::from(x).try_into().ok())
+                .unwrap(),

Review Comment:
   Why `unwrap` an option in a function that returns `Option?
   ```suggestion
               Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) | 
Variant::Int64(_) => {
                   let x = self.as_num::<i64>()?;
                   i128::from(x).try_into().ok()
               }
   ```
   
   Also: the `<i128 as From<i64>>` is surprising -- does the compiler actually 
need it for some reason?



##########
parquet-variant/src/variant.rs:
##########
@@ -815,9 +883,21 @@ impl<'m, 'v> Variant<'m, 'v> {
             Variant::Int64(i) => num_cast(i),
             Variant::Float(f) => num_cast(f),
             Variant::Double(d) => num_cast(d),
-            Variant::Decimal4(d) if d.scale() == 0 => num_cast(d.integer()),
-            Variant::Decimal8(d) if d.scale() == 0 => num_cast(d.integer()),
-            Variant::Decimal16(d) if d.scale() == 0 => num_cast(d.integer()),
+            Variant::Decimal4(d) => {
+                Self::cast_decimal_to_num::<Decimal32Type, T, _>(d.integer(), 
d.scale(), |x| {
+                    x as f64
+                })
+            }
+            Variant::Decimal8(d) => {
+                Self::cast_decimal_to_num::<Decimal64Type, T, _>(d.integer(), 
d.scale(), |x| {
+                    x as f64
+                })
+            }
+            Variant::Decimal16(d) => {
+                Self::cast_decimal_to_num::<Decimal128Type, T, _>(d.integer(), 
d.scale(), |x| {
+                    x as f64
+                })
+            }

Review Comment:
   aside: Very odd choice by `fmt`... I would have expected 
   ```suggestion
               Variant::Decimal16(d) => 
Self::cast_decimal_to_num::<Decimal128Type, T, _>(
                   d.integer(), 
                   d.scale(), 
                   |x| x as f64,
               )
   ```
   🤷 



##########
parquet-variant/src/variant.rs:
##########
@@ -1106,6 +1186,23 @@ impl<'m, 'v> Variant<'m, 'v> {
         self.as_num()
     }
 
+    fn convert_string_to_decimal<D, VD>(input: &str) -> Option<VD>
+    where
+        D: DecimalType,
+        VD: VariantDecimalType<Native = D::Native>,
+        D::Native: NumCast + DecimalCast,
+    {
+        // find the last '.'
+        let scale_usize = input
+            .rsplit_once('.')
+            .map_or_else(|| 0, |(_, frac)| frac.len());

Review Comment:
   ```suggestion
               .map_or(0, |(_, frac)| frac.len());
   ```
   (sorry, I mixed up the two names in my earlier suggestion)
   
   I'm actually surprised clippy didn't flag it.



##########
parquet-variant/src/variant.rs:
##########
@@ -797,14 +838,41 @@ impl<'m, 'v> Variant<'m, 'v> {
         }
     }
 
-    /// Converts a boolean or numeric variant(integers, floating-point, and 
decimals with scale 0)
+    fn cast_decimal_to_num<D, T, F>(raw: D::Native, scale: u8, as_float: F) -> 
Option<T>
+    where
+        D: DecimalType,
+        D::Native: NumCast + ArrowNativeTypeOp,
+        T: DecimalCastTarget,
+        F: Fn(D::Native) -> f64,
+    {
+        let base: D::Native = NumCast::from(10)?;
+
+        base.pow_checked(<u32 as From<u8>>::from(scale))
+            .ok()
+            .and_then(|div| match T::KIND {

Review Comment:
   Another place where `and_then` hurts readability
   ```rust
   let div = base.pow_checked(<u32 as From<u8>>::from(scale)).ok()?;
   match T::KIND {
     ...
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to