klion26 commented on code in PR #9689:
URL: https://github.com/apache/arrow-rs/pull/9689#discussion_r3079382500
##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -246,6 +254,18 @@ where
precision,
scale,
),
+ Variant::Float(f) => single_float_to_decimal::<O>(f64::from(*f), mul),
+ Variant::Double(f) => single_float_to_decimal::<O>(*f, mul),
+ Variant::String(v) if scale > 0 => parse_string_to_decimal_native::<O>(
Review Comment:
I have reviewed the code today, `arrow-cast` has checked the scale in
`decimal.rs::cast_string_to_decimal`(only supports scale >=0 when cast uf8 to
decimal), and I need to update the code here from `>0` to `>=0`, will add a
comment here also.
https://github.com/apache/arrow-rs/blob/711fac88104fc27d89faaa221345bc95686cf1e9/arrow-cast/src/cast/decimal.rs#L725-L730
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -89,7 +89,7 @@ where
D: DecimalType,
F: Fn(D::Native) -> f64,
{
- f(x) / 10_f64.powi(scale)
+ f(x) * 10_f64.powi(-scale)
Review Comment:
I have the same question before changing the code, but the [code on play
rust](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=16e7364eea29db105adc2f9fc297b7c7)
below shows that they are equal, not sure if this is enought.
yes, If this code below is not enough,we should keep it as it in the current
pr
```
let max: u8 = u8::MAX;
for i in 0..max {
let left = 1f64 / 10_f64.powi(<i32 as From::<u8>>::from(i));
let right = 1f64 * 10_f64.powi(-<i32 as From::<u8>>::from(i));
if left != right {
println!("No equal {:?}", i);
}
}
println!("Over")
```
##########
parquet-variant/src/variant.rs:
##########
@@ -815,9 +883,21 @@ impl<'m, 'v> Variant<'m, 'v> {
Variant::Int64(i) => num_cast(i),
Variant::Float(f) => num_cast(f),
Variant::Double(d) => num_cast(d),
- Variant::Decimal4(d) if d.scale() == 0 => num_cast(d.integer()),
- Variant::Decimal8(d) if d.scale() == 0 => num_cast(d.integer()),
- Variant::Decimal16(d) if d.scale() == 0 => num_cast(d.integer()),
+ Variant::Decimal4(d) => {
+ Self::cast_decimal_to_num::<Decimal32Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
+ Variant::Decimal8(d) => {
+ Self::cast_decimal_to_num::<Decimal64Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
+ Variant::Decimal16(d) => {
+ Self::cast_decimal_to_num::<Decimal128Type, T, _>(d.integer(),
d.scale(), |x| {
+ x as f64
+ })
+ }
Review Comment:
try to manually change it, fmt changed back to this format.
##########
parquet-variant-compute/src/type_conversion.rs:
##########
@@ -246,6 +254,18 @@ where
precision,
scale,
),
+ Variant::Float(f) => single_float_to_decimal::<O>(f64::from(*f), mul),
+ Variant::Double(f) => single_float_to_decimal::<O>(*f, mul),
+ Variant::String(v) if scale > 0 => parse_string_to_decimal_native::<O>(
+ v,
+ <i8 as TryInto<usize>>::try_into(scale).expect("scale is positive,
would never fail"),
Review Comment:
Will fix it. Thanks for the feedback. I'd very much appreciate your
feedback, they let me learn more about the Rust type system and the best
practices.
Use `from` because the compiler can guarantee the function is infallible,
but the case here, the compiler can't guarantee (the logic guarantees it's
infallible)
##########
parquet-variant/src/variant.rs:
##########
@@ -797,14 +838,41 @@ impl<'m, 'v> Variant<'m, 'v> {
}
}
- /// Converts a boolean or numeric variant(integers, floating-point, and
decimals with scale 0)
+ fn cast_decimal_to_num<D, T, F>(raw: D::Native, scale: u8, as_float: F) ->
Option<T>
+ where
+ D: DecimalType,
+ D::Native: NumCast + ArrowNativeTypeOp,
+ T: DecimalCastTarget,
+ F: Fn(D::Native) -> f64,
+ {
+ let base: D::Native = NumCast::from(10)?;
+
+ base.pow_checked(<u32 as From<u8>>::from(scale))
+ .ok()
+ .and_then(|div| match T::KIND {
Review Comment:
Fixed
##########
parquet-variant/src/variant.rs:
##########
@@ -294,6 +302,39 @@ pub enum Variant<'m, 'v> {
// We don't want this to grow because it could hurt performance of a
frequently-created type.
const _: () = crate::utils::expect_size_of::<Variant>(80);
+enum NumericKind {
+ Integer,
+ Float,
+}
+
+trait DecimalCastTarget: NumCast + Default {
+ const KIND: NumericKind;
+ fn arrow_type() -> DataType;
+}
+
+macro_rules! impl_decimal_cast_target {
+ ($raw_type: ident, $target_kind:expr, $arrow_type: expr) => {
+ impl DecimalCastTarget for $raw_type {
+ const KIND: NumericKind = $target_kind;
+ fn arrow_type() -> DataType {
+ $arrow_type
+ }
Review Comment:
Changed to const, This style is more consistent.
##########
parquet-variant/src/variant.rs:
##########
@@ -1199,14 +1320,27 @@ impl<'m, 'v> Variant<'m, 'v> {
/// let v1 = Variant::from(VariantDecimal4::try_new(1234_i32, 2).unwrap());
/// assert_eq!(v1.as_decimal16(), VariantDecimal16::try_new(1234_i128,
2).ok());
///
+ /// // or from a string variant if it can be parsed as decimal
+ /// let v2 = Variant::from("123.45");
+ /// assert_eq!(v2.as_decimal16(), VariantDecimal16::try_new(12345,
2).ok());
+ ///
/// // but not if the variant is not a decimal
- /// let v2 = Variant::from("hello!");
- /// assert_eq!(v2.as_decimal16(), None);
+ /// let v3 = Variant::from("hello!");
+ /// assert_eq!(v3.as_decimal16(), None);
/// ```
pub fn as_decimal16(&self) -> Option<VariantDecimal16> {
match *self {
- Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) |
Variant::Int64(_) => {
- self.as_num::<i128>().and_then(|x| x.try_into().ok())
+ Variant::Int8(_) | Variant::Int16(_) | Variant::Int32(_) |
Variant::Int64(_) => self
+ .as_num::<i64>()
+ .map(|x| <i128 as From<i64>>::from(x).try_into().ok())
+ .unwrap(),
Review Comment:
Changed to the suggestion, it looks better
use unwrap here because this is always safe, didn't comment this because I
didn't add a comment because I thought it would be too direct to get this.
Yes, the compiler needs it because there are multiple `from` found
```
Note: candidate #1 is defined in the trait `From`
Note: candidate #2 is defined in an impl of the trait `num_traits::NumCast`
for the type `i128
```
##########
parquet-variant/src/variant.rs:
##########
@@ -1106,6 +1186,23 @@ impl<'m, 'v> Variant<'m, 'v> {
self.as_num()
}
+ fn convert_string_to_decimal<D, VD>(input: &str) -> Option<VD>
+ where
+ D: DecimalType,
+ VD: VariantDecimalType<Native = D::Native>,
+ D::Native: NumCast + DecimalCast,
+ {
+ // find the last '.'
+ let scale_usize = input
+ .rsplit_once('.')
+ .map_or_else(|| 0, |(_, frac)| frac.len());
Review Comment:
Fixed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]