GitHub user yarlett closed a discussion: How to convert Arc<dyn
arrow::arrays::Array> to Vec<Option<T>> or polars::series::Series?
Hi,
Is there a good way to convert an ```Arc<dyn Array>``` to either
```Vec<Option<T>>```, where T is a Rust native type, or else straight to a
```polars::series::Series```?
So far the rough solution I've come up with looks as follows:
```rust
fn arrow_array_to_polars_series(
name: &str,
array: &Arc<dyn arrow::array::Array>,
) -> Result<polars::series::Series, String> {
match array.data_type() {
arrow::datatypes::DataType::Binary => {
match array.as_any().downcast_ref::<arrow::array::BinaryArray>() {
Some(downcast) => Ok(polars::series::Series::new(
name,
downcast.iter().collect::<Vec<Option<&[u8]>>>(),
)),
_ => Err("Couldn't downcast!".into()),
}
}
arrow::datatypes::DataType::Int8 => {
match array.as_any().downcast_ref::<arrow::array::Int32Array>() {
Some(downcast) => Ok(polars::series::Series::new(
name,
downcast.iter().collect::<Vec<Option<i32>>>(),
)),
_ => Err("Couldn't downcast!".into()),
}
}
// Numerous other arrow::datatypes::DataTypes to be filled in below
here...
_ => Err("Unhandled data type!".into()),
}
}
```
However, there are at least 3 problems with fully implementing this approach:
1. Requires a separate match arm for each instance of
arrow::datatypes::DataType of which there are quite a few.
2. Each match arm involves correlating the DataType with the correct arrow
Array type and native Rust type. I know this can be improved using macros but
it's still slightly complex and error prone.
3. There may not even be equivalent Rust native types to use for every
DataType (e.g. not sure how to handle the match arm for DataType:List?) so it
may not even be possible to handle every DataType.
I feel like there may well be a better approach than the one sketched above, so
if anyone has any pointers or suggestions I'd be very grateful.
(For context, the arrow arrays are coming from a SQL query executed by
connectorx, and I'm trying to find a way to convert this column data into a
```polars::data frame::DataFrame``` for subsequent manipulation. I know
connectorx has a .polars() solution, and that connectorx can also return data
as arrow2 arrays, which is a lightweight implementation of arrow used by
Polars, but then I'm tied to using the same version of Polars used by
connectorx, which is very old at this point, and I think going in the direction
I currently am will allow me to decouple the Polars version from connectorx and
get access to more recent and powerful versions of Polars.)
Thanks!
Dan
GitHub link: https://github.com/apache/arrow-rs/discussions/6087
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]