[ 
https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810185#comment-16810185
 ] 

Wes McKinney commented on ARROW-1560:
-------------------------------------

R does match NA (null-ish values) so that should probably be the default

{code}
> match(c(NA, NA, NA, NA), NA)
[1] 1 1 1 1
{code}

On the second question, I'm not sure. We aren't accounting for nulls in other 
hash-related functions like ValueCounts. See ARROW-4787. When you populate the 
hash table with the right-hand-side values, you can set a flag whether null was 
present or not (and at what position) and then use this when VisitNull is 
invoked (if using ArrayDataVisitor turns out to be the most efficient method 
for this, which I'm also not sure about)

> [C++] Kernel implementations for "match" function
> -------------------------------------------------
>
>                 Key: ARROW-1560
>                 URL: https://issues.apache.org/jira/browse/ARROW-1560
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Preeti Suman
>            Priority: Major
>              Labels: Analytics
>             Fix For: 0.14.0
>
>
> Match computes a position index array from an array values into a set of 
> categories
> {code}
> match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a'])
> return [1, 0, 1, null, 0, 1, 0]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to