[
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17310717#comment-17310717
]
Joris Van den Bossche commented on ARROW-12099:
-----------------------------------------------
In the {{pyarrow.compute}} module, we also have {{list_parent_indices}} which
can be used to join the flattened list column with repeated rows of the
original table.
So it's already possible to write an explode function with this functionality
in python:
{code:python}
import pyarrow.compute as pc
def explode_table(table, column):
other_columns = list(table.schema.names)
other_columns.remove(column)
indices = pc.list_parent_indices(table[column])
result = table.select(other_columns).take(indices)
result = result.append_column(pa.field(column,
table.schema.field(column).type.value_type), pc.list_flatten(table[column]))
return result
{code}
{code}
In [80]: table = pa.table({'a': range(3), 'b': [[1, 2], None, [3, 4, 5]]})
In [81]: explode_table(table, 'b')
Out[81]:
pyarrow.Table
a: int64
b: int64
In [82]: explode_table(table, 'b').to_pandas()
Out[82]:
a b
0 0 1
1 0 2
2 2 3
3 2 4
4 2 5
{code}
That said, I think it could be nice to provide this functionality in pyarrow
itself.
> [Python] Explode array column
> -----------------------------
>
> Key: ARROW-12099
> URL: https://issues.apache.org/jira/browse/ARROW-12099
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Malthe Borch
> Priority: Major
>
> In Apache Spark,
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode]
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
> method to allow fully unnesting a
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)