[
https://issues.apache.org/jira/browse/ARROW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325817#comment-17325817
]
Alessandro Molina commented on ARROW-5869:
------------------------------------------
This seems to have been already addressed. It seems it's now possible to access
{{UnionArray}} children using {{UnionArray.field}}
{code:python}
>>> first = pa.array([1, 2, 3])
>>> second = pa.array(["A", "B", "C"])
>>> ua = pa.UnionArray.from_sparse(pa.array([0, 0, 1]), [first, second])
>>> ua.field(0)
<pyarrow.lib.Int64Array object at 0x126d84520>
[
1,
2,
3
]
>>> ua.field(1)
<pyarrow.lib.StringArray object at 0x126d844c0>
[
"A",
"B",
"C"
]
{code}
> [Python] Need a way to access UnionArray's children as Arrays in pyarrow
> ------------------------------------------------------------------------
>
> Key: ARROW-5869
> URL: https://issues.apache.org/jira/browse/ARROW-5869
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.14.0
> Reporter: Jim Pivarski
> Priority: Major
>
>
> There doesn't seem to be a way to get to the children of sparse or dense
> UnionArrays. For other types, there's
> * ListType: array.flatten()
> * StructType: array.field("fieldname")
> * DictionaryType: array.indices and now array.dictionary (in 0.14.0)
> * (other types have no children, I think...)
> The reason this comes up now is that I have a downstream library that does a
> zero-copy view of Arrow by recursively walking over its types and
> interpreting the list of buffers for each type. In the past, I didn't need
> the _array_ children of each array—I popped the right number of buffers off
> the list depending on the type—but now the dictionary for DictionaryType has
> been moved from the type object to the array object (in 0.14.0). Since it's
> neither in the buffers list, nor in the type tree, I need to walk the tree of
> arrays in tandem with the tree of types.
> That would be okay, except that I don't see how to descend from a UnionArray
> to its children.
> This is the function where I do the walk down types (tpe), and now arrays
> (array), while interpreting the right number of buffers at each step.
> [https://github.com/scikit-hep/awkward-array/blob/7c5961405cc39bbf2b489fad171652019c8de41b/awkward/arrow.py#L228-L364]
> Simply exposing the std::vector named "children" as a Python sequence or a
> child(int i) method would provide a way to descend UnionTypes and make this
> kind of access uniform across all types.
> Alternatively, putting the array.dictionary in the list of buffers would also
> do it (and make it unnecessary for me to walk over the arrays), but in
> general it seems like a good idea to make arrays accessible. It seems like it
> belongs in the buffers, but that would probably be a big change, not to be
> undertaken for minor reasons.
> Thanks!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)