[
https://issues.apache.org/jira/browse/ARROW-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17217214#comment-17217214
]
David Sherrier commented on ARROW-5409:
---------------------------------------
Hey Wes I added a benchmark (attached here) and found that at least with my
implementation of a vector<pair<T,index>> it only outperforms our current
implementation when the right side list is between 1 and 4 elements in length.
Keep in mind I ran the benchmark on my laptop so it is possible it would
perform better on a more powerful machine.
Benchmark code: https://github.com/david1437/arrow/tree/ARROW-5394
Vector Implementation with benchmark:
https://github.com/david1437/arrow/tree/ARROW-5409
> [C++] Improvement for IsIn Kernel when right array is small
> -----------------------------------------------------------
>
> Key: ARROW-5409
> URL: https://issues.apache.org/jira/browse/ARROW-5409
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Preeti Suman
> Assignee: David Sherrier
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: set_lookup_benchmark
>
>
> The core of the algorithm (as python) is
> {code:java}
> for idx, elem in array:
> output[i] = (elem in memo_table)
> {code}
> Often the right operand list will be very small, in this case, the hashtable
> should be replaced with a constant vector.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)