[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1394: support decimal scalar value

GitBox Fri, 03 Dec 2021 07:57:48 -0800


alamb commented on a change in pull request #1394:
URL: https://github.com/apache/arrow-datafusion/pull/1394#discussion_r762037532




##########
File path: datafusion/src/scalar.rs
##########
@@ -453,6 +477,26 @@ macro_rules! eq_array_primitive {
 }
 
 impl ScalarValue {
+    /// Create a decimal Scalar from value/precision and scale.
+    pub fn try_new_decimal128(
+        value: i128,
+        precision: usize,
+        scale: usize,
+    ) -> Result<Self> {
+        // make sure the precision and scale is valid
+        // TODO const the max precision and min scale

Review comment:
       The arrow spec doesn't seem to say anything about min/max valid scales: 
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L181-L190
   
   I poked a little around in arrow-rs and I also didn't find any limits on the 
precision or scale either

##########
File path: datafusion/src/scalar.rs
##########
@@ -171,6 +179,17 @@ impl PartialOrd for ScalarValue {
         // any newly added enum variant will require editing this list
         // or else face a compile error
         match (self, other) {
+            // TODO decimal type, we just compare the values which have the 
same precision and scale.

Review comment:
       I think the code in this PR is correct  -- two decimal's can be compared 
if they have the same precision / scale and they can't be compared (returns 
`None`) if they have different precision/scales). 

##########
File path: datafusion/src/scalar.rs
##########
@@ -100,6 +102,12 @@ impl PartialEq for ScalarValue {
         // any newly added enum variant will require editing this list
         // or else face a compile error
         match (self, other) {
+            (Decimal128(v1, p1, s1), Decimal128(v2, p2, s2)) => {
+                v1.eq(v2) && p1.eq(p2) && s1.eq(s2)
+                // TODO how to handle this case: decimal(123,10,1) with 
decimal(1230,10,2)

Review comment:
        `Int8(Some(100))` is not equal to `Int64(Some(100))` with this 
implementation either , so I think it is consistent with the rest of this 
comparison of `decimal(123,10,1) and `decimal(1230,10,2)` are different --
   
   

##########
File path: datafusion/src/scalar.rs
##########
@@ -43,6 +43,8 @@ pub enum ScalarValue {
     Float32(Option<f32>),
     /// 64bit float
     Float64(Option<f64>),
+    /// 128bit decimal, using the i128 to represent the decimal
+    Decimal128(Option<i128>, Option<usize>, Option<usize>),

Review comment:
       🤔  it seems like we always need the precision and scale (to know what 
type the Decimal is). 
   
   Thus, perhaps this could be changed to be
   
   ```suggestion
       Decimal128(Option<i128>, usize, usize),
   ```

##########
File path: datafusion/src/scalar.rs
##########
@@ -465,6 +509,14 @@ impl ScalarValue {
             ScalarValue::Int16(_) => DataType::Int16,
             ScalarValue::Int32(_) => DataType::Int32,
             ScalarValue::Int64(_) => DataType::Int64,
+            ScalarValue::Decimal128(_, Some(precision), Some(scale)) => {
+                DataType::Decimal(*precision, *scale)
+            }
+            ScalarValue::Decimal128(_, _, _) => {
+                // TODO add the default precision and scale for this case
+                // DataType::Decimal(38, 0)
+                panic!("The Decimal Scalar value with invalid precision or 
scale.");

Review comment:
       I think if you changed Decimal as suggested above to always have 
precision and scale, this code would not be possible

##########
File path: datafusion/src/scalar.rs
##########
@@ -1061,19 +1176,41 @@ impl ScalarValue {
                     Arc::new(StructArray::from(field_values))
                 }
                 None => {
-                    let field_values: Vec<_> = fields.iter().map(|field| {
+                    let field_values: Vec<_> = fields
+                        .iter()
+                        .map(|field| {
                             let none_field = 
Self::try_from(field.data_type()).expect(
                                 "Failed to construct null ScalarValue from 
Struct field type"
                             );
                             (field.clone(), none_field.to_array_of_size(size))
-                        }).collect();
+                        })
+                        .collect();
 
                     Arc::new(StructArray::from(field_values))
                 }
             },
         }
     }
 
+    fn get_decimal_value_from_array(
+        array: &ArrayRef,
+        index: usize,
+        precision: &usize,
+        scale: &usize,
+    ) -> ScalarValue {
+        let array = array.as_any().downcast_ref::<DecimalArray>().unwrap();
+        // TODO add checker: the precision and scale are same with array

Review comment:
       this would be a good `assert!` type check to add, though I don't think 
it would ever happen unless there is some bug

##########
File path: datafusion/src/scalar.rs
##########
@@ -831,13 +892,40 @@ impl ScalarValue {
                     "Unsupported creation of {:?} array from ScalarValue {:?}",
                     data_type,
                     scalars.peek()
-                )))
+                )));
             }
         };
 
         Ok(array)
     }
 
+    fn iter_to_decimal_array(
+        scalars: impl IntoIterator<Item = ScalarValue>,
+        precision: &usize,
+        scale: &usize,
+    ) -> Result<DecimalArray> {
+        // collect the value as Option<i128>
+        let array = scalars
+            .into_iter()
+            .map(|element: ScalarValue| match element {
+                ScalarValue::Decimal128(v1, _, _) => v1,
+                _ => unreachable!(),
+            })
+            .collect::<Vec<Option<i128>>>();

Review comment:
       Using `DecimalBuilder` in this way is awkward. I wonder if we could add 
a function to create Decimal values from an array of `i128` (likely it would go 
in arrow-rs). Something like
   
   ```
   let arr = DecimalArray::from_iter_and_scale(array.into_iter(), precision, 
scale)
   ```

##########
File path: datafusion/src/scalar.rs
##########
@@ -253,6 +272,11 @@ impl std::hash::Hash for ScalarValue {
     fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
         use ScalarValue::*;
         match self {
+            Decimal128(v, p, s) => {
+                v.hash(state);
+                p.hash(state);
+                s.hash(state)

Review comment:
       I think we could probably skip hashing the precision and scale - and 
just hash the value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1394: support decimal scalar value

Reply via email to