[ https://issues.apache.org/jira/browse/ARROW-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810185#comment-16810185 ]
Wes McKinney commented on ARROW-1560: ------------------------------------- R does match NA (null-ish values) so that should probably be the default {code} > match(c(NA, NA, NA, NA), NA) [1] 1 1 1 1 {code} On the second question, I'm not sure. We aren't accounting for nulls in other hash-related functions like ValueCounts. See ARROW-4787. When you populate the hash table with the right-hand-side values, you can set a flag whether null was present or not (and at what position) and then use this when VisitNull is invoked (if using ArrayDataVisitor turns out to be the most efficient method for this, which I'm also not sure about) > [C++] Kernel implementations for "match" function > ------------------------------------------------- > > Key: ARROW-1560 > URL: https://issues.apache.org/jira/browse/ARROW-1560 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Assignee: Preeti Suman > Priority: Major > Labels: Analytics > Fix For: 0.14.0 > > > Match computes a position index array from an array values into a set of > categories > {code} > match(['a', 'b', 'a', null, 'b', 'a', 'b'], ['b', 'a']) > return [1, 0, 1, null, 0, 1, 0] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)