Jin Shang created ARROW-18185:
---------------------------------
Summary: [C++][Compute] Support KEEP_NULL option for
compute::Filter
Key: ARROW-18185
URL: https://issues.apache.org/jira/browse/ARROW-18185
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Jin Shang
The current Filter implementation always drops the filtered values. In some use
cases, it's required for the output array to have the same size as the inut
array. So I added a new option FilterOptions::KEEP_NULL where the filtered
values are kept as nulls.
For example, with input [1, 2, 3] and filter [true, false, true], the current
implementation will output [1, 3] and with the new option it will output [1,
null, 3]
This option is simpler to implement since we only need to construct a new
validity bitmap and reuse the input buffers and child arrays. Except for dense
union arrays which don't have validity bitmaps.
It is also faster to filter with FilterOptions::KEEP_NULL according to the
benchmark result in most cases, except for the case when selection percentage
is extremely small so it's cheaper to copy over the selected values.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)