[ 
https://issues.apache.org/jira/browse/ARROW-12099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405371#comment-17405371
 ] 

Ian Cook edited comment on ARROW-12099 at 8/26/21, 5:05 PM:
------------------------------------------------------------

+1 Hive also has an 
[{{explode}}|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-explode]
 function that works like this, but it is very difficult to use at a table 
level—you need to use something called a [lateral 
view|https://cwiki.apache.org/confluence/display/hive/languagemanual+lateralview]
 to do that, and the API is very unintuitive.

[~jorisvandenbossche] I think your example in the previous comment is exactly 
correct. It would be very nice to have an {{explode_table}} kernel like that in 
the Arrow C++ library, exposed to Python and R through bindings.

In addition to working on ListArrays like in this example, this should also 
work on MapArrays. When called on a MapArray, it should return two exploded 
columns—one with the keys, one with the values.


was (Author: icook):
+1 Hive also has an 
[{{explode}}|https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-explode]
 function that works like this, but it is very difficult to use at a table 
level—you need to use something called a [lateral 
view|https://cwiki.apache.org/confluence/display/hive/languagemanual+lateralview]
 to do that, and the API is very unintuitive.

[~jorisvandenbossche] I think your example in the previous comment is exactly 
correct. It would be very nice to have an {{explode_table}} kernel like that in 
the Arrow C++ library, exposed to Python and R through bindings.

In addition to working on ListArrays like in this example, {{explode}} should 
also work on MapArrays. When called on a MapArray, it should return two 
exploded columns—one with the keys, one with the values.

> [Python] Explode array column
> -----------------------------
>
>                 Key: ARROW-12099
>                 URL: https://issues.apache.org/jira/browse/ARROW-12099
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Malthe Borch
>            Priority: Major
>
> In Apache Spark, 
> [explode|https://spark.apache.org/docs/latest/api/sql/index.html#explode] 
> separates the elements of an array column (or expression) into multiple row.
> Note that each explode works at the top-level only (not recursively).
> This would also work with the existing 
> [flatten|https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.flatten]
>  method to allow fully unnesting a 
> [pyarrow.StructArray|https://arrow.apache.org/docs/python/generated/pyarrow.StructArray.html#pyarrow-structarray].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to