[jira] [Commented] (ARROW-12099) [Python] Explode array column

Malthe Borch (Jira) Mon, 29 Mar 2021 09:02:07 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310753#comment-17310753
 ]


Malthe Borch commented on ARROW-12099:
--------------------------------------

[~jorisvandenbossche] in Spark, explode does not "zip" arrays in different 
columns actually – it just copies the entire row for each value in the exploded 
column (which is originally an array) such that if the array had N values, 
there would now be N rows in place of the original row. Rinse and repeat for 
all rows in the original dataframe.

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, 
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] 
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing 
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
>  method to allow fully unnesting a 
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12099) [Python] Explode array column

Reply via email to