[GitHub] [arrow-datafusion] metesynnada opened a new issue, #3572: Decimal128 support for statistical aggregations

GitBox Wed, 21 Sep 2022 06:25:05 -0700


metesynnada opened a new issue, #3572:
URL: https://github.com/apache/arrow-datafusion/issues/3572


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Inside `Accumulator` implementation of  `VarianceAccumulator` (and many 
others like `CovarianceAccumulator`) `update_batch` method casts every datatype 
into `Float64Array` by default. This makes statistical aggregations unsupported 
for `Decimal128Array`.
   
   **Describe the solution you'd like**
   Steps to reproduce the behavior:
   
   ```rust
   #[tokio::test]
   async fn statistical_agg_decimal() -> Result<()> {
       use datafusion::arrow::datatypes::{Field, Schema};
       use datafusion::datasource::MemTable;
       // define a schema.
       let schema = Arc::new(Schema::new(vec![Field::new("a", 
DataType::Decimal128(10,2), true)]));
   
       // define data in two partitions
       let batch = RecordBatch::try_new(
           schema.clone(),
           vec![Arc::new(
           (1..100)
               .map(|i| if i == 2 { None } else { Some(i) })
               .collect::<Decimal128Array>()
               .with_precision_and_scale(10, 2)?,
       )],
       )?;
       // declare a new context. In spark API, this corresponds to a new spark 
SQLsession
       let ctx = SessionContext::new();
   
       // declare a table in memory. In spark API, this corresponds to 
createDataFrame(...).
       let provider = MemTable::try_new(schema, vec![vec![batch]])?;
       ctx.register_table("t", Arc::new(provider))?;
   
       let sql = "SELECT \
                  VAR(a) OVER()\
                  FROM t";
   
       let df = ctx.sql(sql).await?;
       df.show().await?;
       Ok(())
   }
   ```
   
   produces
   
   ```rust
   Error: Plan("The function Variance does not support inputs of type 
Decimal128(10, 2).") 
   ```
   
   **Describe alternatives you've considered**
   N.A
   
   **Additional context**
   If we implement the necessary traits for `ScalarValue` struct, like 
`std::ops::Div` and `std::ops::Mul`, we can safely use the `ScalarValue` for 
the `Decimal128` and `Float64` calculations without default coercing to the 
`Float64`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] metesynnada opened a new issue, #3572: Decimal128 support for statistical aggregations

Reply via email to