[
https://issues.apache.org/jira/browse/ARROW-5760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Antoine Pitrou updated ARROW-5760:
----------------------------------
Fix Version/s: 2.0.0
> [C++] Optimize Take and Filter
> ------------------------------
>
> Key: ARROW-5760
> URL: https://issues.apache.org/jira/browse/ARROW-5760
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Benjamin Kietzman
> Assignee: Benjamin Kietzman
> Priority: Major
> Fix For: 2.0.0
>
>
> There is some question of whether these kernels allocate optimally- for
> example when Filtering or Taking strings it might be more efficient to pass
> over the filter/indices twice, first to determine how much character storage
> will be needed then again into allocated memory:
> https://github.com/apache/arrow/pull/4531#discussion_r297160457
> Additionally, these kernels could probably make good use of scatter/gather
> SIMD instructions.
> Furthermore, Filter's bitmap is currently lazily expanded into the indices of
> elements to be appended to the output array. It would probably be more
> efficient to expand to indices in batches, then gather using an index batch.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)