[ 
https://issues.apache.org/jira/browse/ARROW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325837#comment-17325837
 ] 

Joris Van den Bossche commented on ARROW-5869:
----------------------------------------------

Thanks for checking [~amol-]!

> [Python] Need a way to access UnionArray's children as Arrays in pyarrow
> ------------------------------------------------------------------------
>
>                 Key: ARROW-5869
>                 URL: https://issues.apache.org/jira/browse/ARROW-5869
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>            Reporter: Jim Pivarski
>            Priority: Major
>
>  
> There doesn't seem to be a way to get to the children of sparse or dense 
> UnionArrays. For other types, there's
>  * ListType: array.flatten()
>  * StructType: array.field("fieldname")
>  * DictionaryType: array.indices and now array.dictionary (in 0.14.0)
>  * (other types have no children, I think...)
> The reason this comes up now is that I have a downstream library that does a 
> zero-copy view of Arrow by recursively walking over its types and 
> interpreting the list of buffers for each type. In the past, I didn't need 
> the _array_ children of each array—I popped the right number of buffers off 
> the list depending on the type—but now the dictionary for DictionaryType has 
> been moved from the type object to the array object (in 0.14.0). Since it's 
> neither in the buffers list, nor in the type tree, I need to walk the tree of 
> arrays in tandem with the tree of types.
> That would be okay, except that I don't see how to descend from a UnionArray 
> to its children.
> This is the function where I do the walk down types (tpe), and now arrays 
> (array), while interpreting the right number of buffers at each step.
> [https://github.com/scikit-hep/awkward-array/blob/7c5961405cc39bbf2b489fad171652019c8de41b/awkward/arrow.py#L228-L364]
> Simply exposing the std::vector named "children" as a Python sequence or a 
> child(int i) method would provide a way to descend UnionTypes and make this 
> kind of access uniform across all types.
> Alternatively, putting the array.dictionary in the list of buffers would also 
> do it (and make it unnecessary for me to walk over the arrays), but in 
> general it seems like a good idea to make arrays accessible. It seems like it 
> belongs in the buffers, but that would probably be a big change, not to be 
> undertaken for minor reasons.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to