[jira] [Commented] (ARROW-12099) [Python] Explode array column

Joris Van den Bossche (Jira) Wed, 31 Mar 2021 05:58:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312363#comment-17312363
 ]


Joris Van den Bossche commented on ARROW-12099:
-----------------------------------------------

I _assume_ your example starts with a table like the following?

{code}
In [100]: table = pa.table({'a': [0, 1, 2], 'b': [[4, 5, 6]]*3})

In [101]: table.to_pandas()
Out[101]: 
   a          b
0  0  [4, 5, 6]
1  1  [4, 5, 6]
2  2  [4, 5, 6]
{code}

The function I wrote above to explode a list column in such a table gives:

{code}
In [102]: explode_table(table, 'b').to_pandas()
Out[102]: 
   a  b
0  0  4
1  0  5
2  0  6
3  1  4
4  1  5
5  1  6
6  2  4
7  2  5
8  2  6
{code}

which seems the same output as you showed above?

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, 
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] 
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing 
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
>  method to allow fully unnesting a 
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12099) [Python] Explode array column

Reply via email to