felipecrv commented on code in PR #35750:
URL: https://github.com/apache/arrow/pull/35750#discussion_r1218554815


##########
cpp/src/arrow/compute/kernels/vector_selection_internal.cc:
##########
@@ -82,6 +85,89 @@ Status PreallocatePrimitiveArrayData(KernelContext* ctx, 
int64_t length, int bit
 
 namespace {
 
+/// \brief Iterate over a REE filter, emitting ranges of a plain values array 
that
+/// would pass the filter.
+///
+/// Differently from REExREE, and REExPlain filtering, PlainxREE filtering
+/// does not produce a REE output, but rather a plain output array. As such 
it's
+/// much simpler.
+///
+/// \param filter_may_have_nulls Only pass false if you know the filter has no 
nulls.
+template <typename FilterRunEndType>
+void VisitPlainxREEFilterOutputSegmentsImpl(
+    const ArraySpan& filter, bool filter_may_have_nulls,
+    FilterOptions::NullSelectionBehavior null_selection,
+    const EmitREEFilterSegment& emit_segment) {
+  using FilterRunEndCType = typename FilterRunEndType::c_type;
+  const ArraySpan& filter_values = arrow::ree_util::ValuesArray(filter);
+  const int64_t filter_values_offset = filter_values.offset;
+  const uint8_t* filter_is_valid = filter_values.buffers[0].data;
+  const uint8_t* filter_selection = filter_values.buffers[1].data;
+  filter_may_have_nulls = filter_may_have_nulls && filter_is_valid != NULLPTR 
&&
+                          filter_values.null_count != 0;
+
+  const arrow::ree_util::RunEndEncodedArraySpan<FilterRunEndCType> 
filter_span(filter);
+  auto it = filter_span.begin();
+  if (filter_may_have_nulls) {
+    if (null_selection == FilterOptions::EMIT_NULL) {
+      while (!it.is_end(filter_span)) {
+        const int64_t i = filter_values_offset + it.index_into_array();
+        const bool valid = bit_util::GetBit(filter_is_valid, i);
+        const bool emit = !valid || bit_util::GetBit(filter_selection, i);
+        if (emit) {
+          emit_segment(it.logical_position(), it.run_length(), valid);

Review Comment:
   I developed the REExREE kernels first and measured the binary-size impact of 
using template-param lambdas: my compilation unit was the biggest in the whole 
project, so I decided to migrate to `std::function` to save multiple MBs in 
binary size. 
   
   
![image](https://github.com/apache/arrow/assets/207795/18cb1271-04c9-4737-a68a-56301fde3ebf)
   
   The PlainxREE kernel is simpler, so maybe the impact wouldn't be so bad.
   
   > Of course, ideally a REE-encoded array has long enough runs to make REE 
encoding worthwhile...
   
   Exactly. So I think it's better to start with `std::function` to avoid 
inflating the library size. If this gains adoption, we can revisit the kernels 
later.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to