felipecrv commented on code in PR #35750:
URL: https://github.com/apache/arrow/pull/35750#discussion_r1218554815
##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -82,6 +85,89 @@ Status PreallocatePrimitiveArrayData(KernelContext* ctx,
int64_t length, int bit
namespace {
+/// \brief Iterate over a REE filter, emitting ranges of a plain values array
that
+/// would pass the filter.
+///
+/// Differently from REExREE, and REExPlain filtering, PlainxREE filtering
+/// does not produce a REE output, but rather a plain output array. As such
it's
+/// much simpler.
+///
+/// \param filter_may_have_nulls Only pass false if you know the filter has no
nulls.
+template <typename FilterRunEndType>
+void VisitPlainxREEFilterOutputSegmentsImpl(
+ const ArraySpan& filter, bool filter_may_have_nulls,
+ FilterOptions::NullSelectionBehavior null_selection,
+ const EmitREEFilterSegment& emit_segment) {
+ using FilterRunEndCType = typename FilterRunEndType::c_type;
+ const ArraySpan& filter_values = arrow::ree_util::ValuesArray(filter);
+ const int64_t filter_values_offset = filter_values.offset;
+ const uint8_t* filter_is_valid = filter_values.buffers[0].data;
+ const uint8_t* filter_selection = filter_values.buffers[1].data;
+ filter_may_have_nulls = filter_may_have_nulls && filter_is_valid != NULLPTR
&&
+ filter_values.null_count != 0;
+
+ const arrow::ree_util::RunEndEncodedArraySpan<FilterRunEndCType>
filter_span(filter);
+ auto it = filter_span.begin();
+ if (filter_may_have_nulls) {
+ if (null_selection == FilterOptions::EMIT_NULL) {
+ while (!it.is_end(filter_span)) {
+ const int64_t i = filter_values_offset + it.index_into_array();
+ const bool valid = bit_util::GetBit(filter_is_valid, i);
+ const bool emit = !valid || bit_util::GetBit(filter_selection, i);
+ if (emit) {
+ emit_segment(it.logical_position(), it.run_length(), valid);
Review Comment:
I developed the REExREE kernels first and measured the binary-size impact of
using template-param lambdas: my compilation unit was the biggest in the whole
project, so I decided to migrate to `std::function` to save multiple MBs in
binary size.

The PlainxREE kernel is simpler, so maybe the impact wouldn't be so bad.
> Of course, ideally a REE-encoded array has long enough runs to make REE
encoding worthwhile...
Exactly. So I think it's better to start with `std::function` to avoid
inflating the library size. If this gains adoption, we can revisit the kernels
later.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]