GitHub user yarlett closed a discussion: How to convert Arc<dyn 
arrow::arrays::Array> to Vec<Option<T>> or polars::series::Series?

Hi,

Is there a good way to convert an ```Arc<dyn Array>``` to either 
```Vec<Option<T>>```, where T is a Rust native type, or else straight to a 
```polars::series::Series```?

So far the rough solution I've come up with looks as follows:

```rust
fn arrow_array_to_polars_series(
    name: &str,
    array: &Arc<dyn arrow::array::Array>,
) -> Result<polars::series::Series, String> {
    match array.data_type() {
        arrow::datatypes::DataType::Binary => {
            match array.as_any().downcast_ref::<arrow::array::BinaryArray>() {
                Some(downcast) => Ok(polars::series::Series::new(
                    name,
                    downcast.iter().collect::<Vec<Option<&[u8]>>>(),
                )),
                _ => Err("Couldn't downcast!".into()),
            }
        }
        arrow::datatypes::DataType::Int8 => {
            match array.as_any().downcast_ref::<arrow::array::Int32Array>() {
                Some(downcast) => Ok(polars::series::Series::new(
                    name,
                    downcast.iter().collect::<Vec<Option<i32>>>(),
                )),
                _ => Err("Couldn't downcast!".into()),
            }
        }
        // Numerous other arrow::datatypes::DataTypes to be filled in below 
here...
        _ => Err("Unhandled data type!".into()),
    }
}

```

However, there are at least 3 problems with fully implementing this approach:
  1. Requires a separate match arm for each instance of 
arrow::datatypes::DataType of which there are quite a few.
  2. Each match arm involves correlating the DataType with the correct arrow 
Array type and native Rust type. I know this can be improved using macros but 
it's still slightly complex and error prone.
  3. There may not even be equivalent Rust native types to use for every 
DataType (e.g. not sure how to handle the match arm for DataType:List?) so it 
may not even be possible to handle every DataType.

I feel like there may well be a better approach than the one sketched above, so 
if anyone has any pointers or suggestions I'd be very grateful.

(For context, the arrow arrays are coming from a SQL query executed by 
connectorx, and I'm trying to find a way to convert this column data into a 
```polars::data frame::DataFrame``` for subsequent manipulation. I know 
connectorx has a .polars() solution, and that connectorx can also return data 
as arrow2 arrays, which is a lightweight implementation of arrow used by 
Polars, but then I'm tied to using the same version of Polars used by 
connectorx, which is very old at this point, and I think going in the direction 
I currently am will allow me to decouple the Polars version from connectorx and 
get access to more recent and powerful versions of Polars.)

Thanks!

Dan

GitHub link: https://github.com/apache/arrow-rs/discussions/6087

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to