[ 
https://issues.apache.org/jira/browse/ARROW-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620437#comment-17620437
 ] 

Antoine Pitrou edited comment on ARROW-18097 at 10/19/22 4:11 PM:
------------------------------------------------------------------

Then there probably should be a "list_index" function as well, similar to 
"is_in" vs. "index_in" ?
{code}
pc.list_index(arr, "b")
# -> 1, None, 0
{code}



was (Author: pitrou):
Then there probably should be a "list_index" function as well, similar to 
"is_in" vs. "index_in" ?
{code}
pc.list_contains(arr, "b")
# -> 1, None, 0
{code}


> [C++] Add a "list_contains" kernel
> ----------------------------------
>
>                 Key: ARROW-18097
>                 URL: https://issues.apache.org/jira/browse/ARROW-18097
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: compute, kernel
>
> Assume you have a list array:
> {code}
> arr = pa.array([["a", "b"], ["a", "c"], ["b", "c", "d"]])
> {code}
> And you want to know for each list if it contains a certain value (of the 
> same type as the list's values). A "list_contains" function (or other name) 
> would be useful for that:
> {code}
> pc.list_contains(arr, "a")
> # -> True, True False
> {code}
> The current workaround that I found was flattening, checking equality, and 
> then reducing again with groupby, but this is quite tedious:
> {code}
> >>> temp = pa.table({'index': pc.list_parent_indices(arr), 'contains_value': 
> >>> pc.equal(pc.list_flatten(arr), "a")})
> >>> temp.group_by('index').aggregate([('contains_value', 
> >>> 'any')])['contains_value_any'].chunk(0)
> <pyarrow.lib.BooleanArray object at 0x7ffaf3f8de20>
> [
>   true,
>   true,
>   false
> ]
> {code}
> But this also only works if there are no empty or missing list values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to