kondziu opened a new issue, #2035:
URL: https://github.com/apache/arrow-rs/issues/2035
**Which part is this question about**
The question is about the return type of `UnionArray::child`, how it
compares to `StructArray::column`, and why they are different.
**Describe your question**
`StructArray::column` returns `&ArrayRef`, a reference to `Arc<dyn Array>`
owned by the `StructArray` object. However, its spiritual equivalent in
`UnionArray`, the method `UnionArray::child` returns an owned `Arc<dyn Array>`.
The implementations of both are relatively similar: `StructArray::column`
and `UnionArray::child` essentially both look up an index in an array. The only
difference that I see is that `StructArray::column` returns the reference to
what it finds, whereas `UnionArray::child` calls `clone` on that reference to
return an owned object.
```rust
// StructArray
pub fn column(&self, pos: usize) -> &ArrayRef {
&self.boxed_fields[pos]
}
```
```rust
// Union
pub fn child(&self, type_id: i8) -> ArrayRef {
assert!(0 <= type_id);
assert!((type_id as usize) < self.boxed_fields.len());
self.boxed_fields[type_id as usize].clone()
}
```
It makes sense to me to always return a reference. Returning a reference
gives an additional amount of flexibility to the user, who can call clone on
the reference to get an owned Arc, or who can retain the reference to maintain
a connection between the returned column and the union/array from which it
came. The contrary also has the downside of always engaging with the reference
counter and paying its overhead, regardless of whether its necessary or not.
I am curious why this discrepancy exists. I am guessing there is a specific
technical reason that I don't understand, in which case I'd love to find out
what it is. On the other hand, I am surreptitiously hoping this is an oversight
and I might possibly agitate to harmonize the return types :)
**Additional context**
I am writing a function that traverses a `&'a RecordBatch` following some
user-defined path and returns a column it finds at the end. The column is meant
to be downcast into a specific type (such as `UInt32Array`).
Since `StructArray::column` provides a reference with a lifetime of &'a, I
can downcast it with `.as_any().downcast()` and receive a reference with a
lifetime `'a`. However, if the column is found inside a `UnionArray`, the
method `UnionArray::child` produces an owned object. I can only downcast this
object to a reference type, which will then only live as long as the Arc from
which it came, which means only until the end of the function
A simplified example:
```rust
fn find_u32_column_of_struct(batch: &'a RecordBatch, path: &[&str]) -> &'a
UInt32Array {
// Find array
let array: &'a StructArray = find_path_in_batch_somehow(batch,
path[0..path.len() - 1])
// Find column
let schema = array.schema();
let fields = array.fields().iter();
let columns = array.columns().iter();
let column: &'a Arc<dyn Array> = fields
.zip(columns)
.filter(|(field, _)| field.name() == path[path.len() - 1])
.map(|(_, column)| column)
.exactly_one()
.unwrap();
// Downcast to UInt32Array
column.as_ref()
.as_any()
.downcast_ref()
.unwrap()
}
```
```rust
fn find_u32_child_of_union(batch: &'a RecordBatch, path: &[&str]) -> &'a
UInt32Array {
// Find array
let union: &'a UnionArray = find_path_in_batch_somehow(batch,
path[0..path.len() - 1])
// Find child
let index = union.type_names()
.into_iter()
.enumerate()
.filter(|(_index, type_name)| type_name == &path[path.len() - 1])
.map(|(index, _)| index as i8)
.exactly_one()
.unwrap();
let child: Arc<dyn Array> = union.child(index);
// Downcast to UInt32Array
child.as_ref()
.as_any()
.downcast_ref()
.unwrap()
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]