gstvg commented on issue #5700:
URL: https://github.com/apache/arrow-rs/issues/5700#issuecomment-2111366957
Hi @HadrienG2
Are you working on array builders only, or accessing concrete arrays data
too?
I've been playing with typed accessor for concrete arrays for the past few
weeks, and pushed part of it as a draft #5767
My current approach is use derive macros on struct of arrays
```
#[derive(StructArray)]
struct MyStructArray<'a> {
number: &'a Int32Array,
text: &'a StringArray,
}
```
This would implement `TryFrom<&'a StructArray>` and
`TypedStructInnerAccessor` for `MyStructArray` and `ArrayAccessor` for
`TypedStruct<T: TypedStructInnerAccessor> { type Item =
TypedStructInnerAccessor::Item } `
where `TypedStructInnerAccessor::Item ` is a generated type with a few
methods for each column:
```
struct MyStructArrayStructAccessorItem<'a> {
array: TypedStructArray<&'a MyStructArray>,
index: usize
}
impl<'a> MyStructArrayStructAccessorItem<'a> {
#[inline]
fn is_valid(&self) -> ::std::primitive::bool {
self.struct_.is_valid(self.index)
}
#[inline]
fn is_null(&self) -> ::std::primitive::bool {
self.struct_.is_null(self.index)
}
#[inline]
fn text(&self) -> <&'a StringArray as
::arrow::array::ArrayAccessor>::Item {
(&self.struct_.fields().text).value(self.index)
}
#[inline]
fn text_opt(
&self,
) -> ::std::option::Option<
<&'a StringArray as ::arrow::array::ArrayAccessor>::Item,
> {
if (&self.struct_.fields().text).is_valid(self.index) {
Some((&self.struct_.fields().text).value(self.index))
} else {
None
}
}
#[inline]
fn is_text_valid(&self) -> ::std::primitive::bool {
self.struct_.fields().text.is_valid(self.index)
}
... the same methods for text
}
```
`TypedStructArray` def:
```
#[derive(Debug)]
pub struct TypedStructArray<'a, T> {
fields: T,
struct_: &'a StructArray,
}
impl<T> TypedStructArray<T> {
pub fn fields(&self) -> &T {
&self.fields
}
}
impl<'a, T: TryFrom<&'a StructArray, Error = ArrowError>> TryFrom<&'a dyn
Array>
for TypedStructArray<T>
{
type Error = ArrowError;
fn try_from(value: &'a dyn Array) -> Result<Self, Self::Error> {
let struct_ = <&'a StructArray>::try_from(value)?;
Ok(Self {
fields: T::try_from(&struct_)?,
struct_: struct,
})
}
}
impl<'a, T: TypedStructInnerAccessor<'a>> ArrayAccessor for &'a
TypedStructArray<T> {
type Item = T::Item;
fn value(&self, index: ::std::primitive::usize) -> Self::Item {
assert!(
index < self.len(),
"Trying to access an element at index {} from a TypedStructArray
of length {}",
index,
self.len()
);
unsafe { self.value_unchecked(index) }
}
unsafe fn value_unchecked(&self, index: ::std::primitive::usize) ->
Self::Item {
(*self, index).into()
}
}
pub trait TypedStructInnerAccessor<'a>: std::fmt::Debug + Send + Sync +
Sized + 'a {
type Item: std::fmt::Debug + Send + Sync + From<(&'a
TypedStructArray<Self>, usize)>;
}
impl<T: std::fmt::Debug + Send + Sync> Array for TypedStructArray<T> {
fn as_any(&self) -> &dyn std::any::Any {
self.struct_.as_any()
}
... forward other Array methods to self.struct_
}
```
The reason for returning a intermediary value with methods instead of a
tuple or a struct with values is to not access any memory that the user may not
want
I also using the same approach for `RecordBatch`, `Union`, `Map`,
`GenericList`, `FixedSizeList` and `FixedSizeBinary`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]