Jim Pivarski created ARROW-5869:
-----------------------------------
Summary: Need a way to access UnionArray's children as Arrays in
pyarrow
Key: ARROW-5869
URL: https://issues.apache.org/jira/browse/ARROW-5869
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.14.0
Reporter: Jim Pivarski
There doesn't seem to be a way to get to the children of sparse or dense
UnionArrays. For other types, there's
* ListType: array.flatten()
* StructType: array.field("fieldname")
* DictionaryType: array.indices and now array.dictionary (in 0.14.0)
* (other types have no children, I think...)
The reason this comes up now is that I have a downstream library that does a
zero-copy view of Arrow by recursively walking over its types and interpreting
the list of buffers for each type. In the past, I didn't need the _array_
children of each array—I popped the right number of buffers off the list
depending on the type—but now the dictionary for DictionaryType has been moved
from the type object to the array object (in 0.14.0). Since it's neither in the
buffers list, nor in the type tree, I need to walk the tree of arrays in tandem
with the tree of types.
That would be okay, except that I don't see how to descend from a UnionArray to
its children.
This is the function where I do the walk down types (tpe), and now arrays
(array), while interpreting the right number of buffers at each step.
[https://github.com/scikit-hep/awkward-array/blob/7c5961405cc39bbf2b489fad171652019c8de41b/awkward/arrow.py#L228-L364]
Simply exposing the std::vector named "children" as a Python sequence or a
child(int i) method would provide a way to descend UnionTypes and make this
kind of access uniform across all types.
Alternatively, putting the array.dictionary in the list of buffers would also
do it (and make it unnecessary for me to walk over the arrays), but in general
it seems like a good idea to make arrays accessible. It seems like it belongs
in the buffers, but that would probably be a big change, not to be undertaken
for minor reasons.
Thanks!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)