alamb opened a new issue #1047:
URL: https://github.com/apache/arrow-rs/issues/1047


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   When implementing analytics, users often want to do operations like `array` 
+ `constant` or `array + array`
   
   The Rust implementation of Arrow is strongly typed 🙌  , including the 
[compute 
kernels](https://docs.rs/arrow/6.4.0/arrow/compute/kernels/index.html), but 
often data is passed around in `Arc<dyn Array>` / `ArrayRef` so the types can 
be dynamic. This then places the burden on the user of the library to determine 
the exact types of the inputs in order to call the appropriate kernels
   
   Let's take for example, trying to compare an array to find all elements that 
contain the number `5` and your input an array may have `Int64` or `Int32` 
values
   
   In the current version of arrow, in order to do so you would need to write 
code like
   
   ```rust
   use arrow::compute::eq_scalar;
   
   fn find_5(array: ArrayRef) -> Result<BooleanArray>{
     match array.data_type() {
       DataType::Int64 => 
eq_scalar(array.as_any().downcast_ref::<Int64Array>().unwrap(), 5),
       DataType::UInt64 => 
eq_scalar(array.as_any().downcast_ref::<UInt64Array>().unwrap(), 5),
       ...
     }
   }
   ```
   This ends up being macroized and is non ideal because the user had to do 
dynamic type dispatch anyways, so there is no runtime performance benefit to 
strongly typed kernels
   
   **Describe the solution you'd like**
   
   It would be nice to be able to call a single function and let the rust 
library dynamically (at runtime) pick the correct kernel to call.
   
   As described and suggested by @jorgecarleitao  and @nevi-me  in 
https://github.com/apache/arrow-rs/pull/984#issuecomment-994093090 this could 
go all the way and take the form of following the C++ `Datum` model
   
   ```rust
   fn eq(left: Datum, right: Datum) -> BooleanArray {
   ...
   }
   ```
   
   Where `Datum` looks something like
   
   ```rust
   enum Datum {
     Scalar(ScalarValue),
     Array(ArrayRef)
   }
   ```
   
   And `ScalarValue` which would look something like `ScalarValue` from 
DataFusion 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/scalar.rs 
or `Scalar` from arrow2 
https://github.com/jorgecarleitao/arrow2/tree/main/src/scalar
   
   **Describe alternatives you've considered**
   The alternative is a continued proliferation of individual kernels such as 
`eq`, `eq_utf8`, etc
   
   **Additional context**
   There is a lot of additional discussion on 
https://github.com/apache/arrow-rs/pull/984
   
   cc @matthewmturner @Dandandan @Jimexist @houqp 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to