jorisvandenbossche commented on issue #41833:
URL: https://github.com/apache/arrow/issues/41833#issuecomment-2141612574

   Note that if it is about accessing that subfield of a struct array: at that 
point you indeed typically (although depending on the exact use case) want to 
"propagate" the parent struct null values to child field as well. 
   
   For that reason, pyarrow provides two separate APIs to get the child array 
(using your original example as `arr`):
   
   ```python
   # getting the "raw" child array as stored under the hood
   >>> arr.field("outer").field("inner_1")
   Out[14]: 
   <pyarrow.lib.Int64Array object at 0x7f23734339a0>
   [
     1,
     3,
     0
   ]
   
   # getting the "logical" child array
   >>> pc.struct_field(arr, ["outer", "inner_1"])
   Out[20]: 
   <pyarrow.lib.Int64Array object at 0x7f237276b3a0>
   [
     1,
     3,
     null
   ]
   ```
   
   This API is far from ideal. On the C++ side, there is a 
`StructArray::GetFlattenedField` that gives you this logical, "flattened" 
version with nulls propagated. I personally think the most logical thing to do 
as a user ('.field(..)`) should give you what most users expect (and is 
safest), i.e. the flattened version with nulls propagated. See 
https://github.com/apache/arrow/issues/14970 for this.
   
   I assume that on the datafusion side, there should also be some distinction 
between those two ways to get a field.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to