[jira] [Commented] (ARROW-12099) [Python] Explode array column

Joris Van den Bossche (Jira) Mon, 29 Mar 2021 08:13:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310717#comment-17310717
 ]


Joris Van den Bossche commented on ARROW-12099:
-----------------------------------------------

In the {{pyarrow.compute}} module, we also have {{list_parent_indices}} which 
can be used to join the flattened list column with repeated rows of the 
original table.

So it's already possible to write an explode function with this functionality 
in python:

{code:python}
import pyarrow.compute as pc

def explode_table(table, column):
    other_columns = list(table.schema.names)
    other_columns.remove(column)
    indices = pc.list_parent_indices(table[column])
    result = table.select(other_columns).take(indices)
    result = result.append_column(pa.field(column, 
table.schema.field(column).type.value_type), pc.list_flatten(table[column]))
    return result
{code}

{code}
In [80]: table = pa.table({'a': range(3), 'b': [[1, 2], None, [3, 4, 5]]})

In [81]: explode_table(table, 'b')
Out[81]: 
pyarrow.Table
a: int64
b: int64

In [82]: explode_table(table, 'b').to_pandas()
Out[82]: 
   a  b
0  0  1
1  0  2
2  2  3
3  2  4
4  2  5
{code}

That said, I think it could be nice to provide this functionality in pyarrow 
itself.

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, 
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] 
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing 
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
>  method to allow fully unnesting a 
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-12099) [Python] Explode array column

Reply via email to