[ 
https://issues.apache.org/jira/browse/ARROW-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17325817#comment-17325817
 ] 

Alessandro Molina commented on ARROW-5869:
------------------------------------------

This seems to have been already addressed. It seems it's now possible to access 
{{UnionArray}} children using {{UnionArray.field}}


{code:python}
>>> first = pa.array([1, 2, 3])
>>> second = pa.array(["A", "B", "C"])
>>> ua = pa.UnionArray.from_sparse(pa.array([0, 0, 1]), [first, second])
>>> ua.field(0)
<pyarrow.lib.Int64Array object at 0x126d84520>
[
  1,
  2,
  3
]
>>> ua.field(1)
<pyarrow.lib.StringArray object at 0x126d844c0>
[
  "A",
  "B",
  "C"
]
{code}



> [Python] Need a way to access UnionArray's children as Arrays in pyarrow
> ------------------------------------------------------------------------
>
>                 Key: ARROW-5869
>                 URL: https://issues.apache.org/jira/browse/ARROW-5869
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.0
>            Reporter: Jim Pivarski
>            Priority: Major
>
>  
> There doesn't seem to be a way to get to the children of sparse or dense 
> UnionArrays. For other types, there's
>  * ListType: array.flatten()
>  * StructType: array.field("fieldname")
>  * DictionaryType: array.indices and now array.dictionary (in 0.14.0)
>  * (other types have no children, I think...)
> The reason this comes up now is that I have a downstream library that does a 
> zero-copy view of Arrow by recursively walking over its types and 
> interpreting the list of buffers for each type. In the past, I didn't need 
> the _array_ children of each array—I popped the right number of buffers off 
> the list depending on the type—but now the dictionary for DictionaryType has 
> been moved from the type object to the array object (in 0.14.0). Since it's 
> neither in the buffers list, nor in the type tree, I need to walk the tree of 
> arrays in tandem with the tree of types.
> That would be okay, except that I don't see how to descend from a UnionArray 
> to its children.
> This is the function where I do the walk down types (tpe), and now arrays 
> (array), while interpreting the right number of buffers at each step.
> [https://github.com/scikit-hep/awkward-array/blob/7c5961405cc39bbf2b489fad171652019c8de41b/awkward/arrow.py#L228-L364]
> Simply exposing the std::vector named "children" as a Python sequence or a 
> child(int i) method would provide a way to descend UnionTypes and make this 
> kind of access uniform across all types.
> Alternatively, putting the array.dictionary in the list of buffers would also 
> do it (and make it unnecessary for me to walk over the arrays), but in 
> general it seems like a good idea to make arrays accessible. It seems like it 
> belongs in the buffers, but that would probably be a big change, not to be 
> undertaken for minor reasons.
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to