[
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405371#comment-17405371
]
Ian Cook edited comment on ARROW-12099 at 8/26/21, 5:05 PM:
------------------------------------------------------------
+1 Hive also has an
[{{explode}}|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-explode]
function that works like this, but it is very difficult to use at a table
level—you need to use something called a [lateral
view|https://cwiki.apache.org/confluence/display/hive/languagemanual+lateralview]
to do that, and the API is very unintuitive.
[~jorisvandenbossche] I think your example in the previous comment is exactly
correct. It would be very nice to have an {{explode_table}} kernel like that in
the Arrow C++ library, exposed to Python and R through bindings.
In addition to working on ListArrays like in this example, this should also
work on MapArrays. When called on a MapArray, it should return two exploded
columns—one with the keys, one with the values.
was (Author: icook):
+1 Hive also has an
[{{explode}}|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-explode]
function that works like this, but it is very difficult to use at a table
level—you need to use something called a [lateral
view|https://cwiki.apache.org/confluence/display/hive/languagemanual+lateralview]
to do that, and the API is very unintuitive.
[~jorisvandenbossche] I think your example in the previous comment is exactly
correct. It would be very nice to have an {{explode_table}} kernel like that in
the Arrow C++ library, exposed to Python and R through bindings.
In addition to working on ListArrays like in this example, {{explode}} should
also work on MapArrays. When called on a MapArray, it should return two
exploded columns—one with the keys, one with the values.
> [Python] Explode array column
> -----------------------------
>
> Key: ARROW-12099
> URL: https://issues.apache.org/jira/browse/ARROW-12099
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Malthe Borch
> Priority: Major
>
> In Apache Spark,
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode]
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
> method to allow fully unnesting a
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].
--
This message was sent by Atlassian Jira
(v8.3.4#803005)