kondziu opened a new issue, #2035:
URL: https://github.com/apache/arrow-rs/issues/2035

   **Which part is this question about**
   The question is about the return type of `UnionArray::child`, how it 
compares to `StructArray::column`, and why they are different.
   
   **Describe your question**
   `StructArray::column` returns `&ArrayRef`, a reference to `Arc<dyn Array>` 
owned by the `StructArray` object. However, its spiritual equivalent in 
`UnionArray`, the method `UnionArray::child` returns an owned `Arc<dyn Array>`. 
   
   The implementations of both are relatively similar: `StructArray::column` 
and `UnionArray::child` essentially both look up an index in an array. The only 
difference that I see is that `StructArray::column` returns the reference to 
what it finds, whereas `UnionArray::child` calls `clone` on that reference to 
return an owned object. 
   
   ```rust
   // StructArray
   pub fn column(&self, pos: usize) -> &ArrayRef {
       &self.boxed_fields[pos]
   }
   ```
   
   ```rust
   // Union
   pub fn child(&self, type_id: i8) -> ArrayRef {
       assert!(0 <= type_id);
       assert!((type_id as usize) < self.boxed_fields.len());
       self.boxed_fields[type_id as usize].clone()
   }
   ```
   
   It makes sense to me to always return a reference. Returning a reference 
gives an additional amount of flexibility to the user, who can call clone on 
the reference to get an owned Arc, or who can retain the reference to maintain 
a connection between the returned column and the union/array from which it 
came. The contrary also has the downside of always engaging with the reference 
counter and paying its overhead, regardless of whether its necessary or not.
   
   I am curious why this discrepancy exists. I am guessing there is a specific 
technical reason that I don't understand, in which case I'd love to find out 
what it is. On the other hand, I am surreptitiously hoping this is an oversight 
and I might possibly agitate to harmonize the return types :)
   
   **Additional context**
   I am writing a function that traverses a `&'a RecordBatch` following some 
user-defined path and returns a column it finds at the end. The column is meant 
to be downcast into a specific type (such as `UInt32Array`). 
   
   Since `StructArray::column`  provides a reference with a lifetime of &'a, I 
can downcast it with `.as_any().downcast()` and receive a reference with a 
lifetime `'a`. However, if the column is found inside a `UnionArray`, the 
method `UnionArray::child` produces an owned object. I can only downcast this 
object to a reference type, which will then only live as long as the Arc from 
which it came, which means only until the end of the function
   
   A simplified example:
   ```rust
   fn find_u32_column_of_struct(batch: &'a RecordBatch, path: &[&str]) -> &'a 
UInt32Array {
       // Find array
       let array: &'a StructArray = find_path_in_batch_somehow(batch, 
path[0..path.len() - 1])
     
       // Find column
       let schema = array.schema();
       let fields = array.fields().iter();
       let columns = array.columns().iter();
       let column: &'a Arc<dyn Array> = fields
           .zip(columns)
           .filter(|(field, _)| field.name() == path[path.len() - 1])
           .map(|(_, column)| column)
           .exactly_one()
           .unwrap();
     
       // Downcast to UInt32Array    
       column.as_ref()
           .as_any()
           .downcast_ref()
           .unwrap()
   }
   ```
   
   ```rust
   fn find_u32_child_of_union(batch: &'a RecordBatch, path: &[&str]) -> &'a 
UInt32Array {
       // Find array
       let union: &'a UnionArray = find_path_in_batch_somehow(batch, 
path[0..path.len() - 1])
     
       // Find child
       let index = union.type_names()
           .into_iter()
           .enumerate()
           .filter(|(_index, type_name)| type_name == &path[path.len() - 1])
           .map(|(index, _)| index as i8)
           .exactly_one()
           .unwrap();
       let child: Arc<dyn Array> = union.child(index);
     
       // Downcast to UInt32Array    
       child.as_ref()
          .as_any()
          .downcast_ref()
          .unwrap()
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to