felipecrv commented on code in PR #41975:
URL: https://github.com/apache/arrow/pull/41975#discussion_r1639939110


##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -1034,9 +1034,23 @@ class VectorExecutor : public 
KernelExecutorImpl<VectorKernel> {
     output_num_buffers_ = 
static_cast<int>(output_type_.type->layout().buffers.size());
 
     // Decide if we need to preallocate memory for this kernel
-    validity_preallocated_ =
-        (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
-         kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL);
+    validity_preallocated_ = false;
+    if (output_type_.type->id() != Type::NA) {
+      if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+        // Override the flag if kernel asks for pre-allocation
+        validity_preallocated_ = true;
+      } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+        bool elide_validity_bitmap = true;
+        for (const auto& arg : batch.values) {
+          auto null_gen = NullGeneralization::Get(arg) == 
NullGeneralization::ALL_VALID;
+
+          // If not all valid, this becomes false
+          elide_validity_bitmap = elide_validity_bitmap && null_gen;
+        }
+        validity_preallocated_ = !elide_validity_bitmap;
+      }
+    }

Review Comment:
   Sorry, by branches I meant branches in the state-space the execution 
framework and the kernels might traverse. Meaning that without duplication of 
test cases for every kernel, we might hide a latent bug until an unlikely array 
configuration causes a SIGSEGV on the non-pre-allocated validity buffer.
   
   > Back to preallocation-validity buffer, if kernel-function sets 
`NullHandling::INTERSECTION`, does it mean that kernel-function may not be sure 
whether it needs a pre-allocated validity buffer and let the Executor decide 
for itself? At this time, the Executor can only dynamically determine whether 
to set the pre-allocate buffer based on the input data.
   
   By statically I mean a one-time configuration of the `kernel_` when it's 
added to the function in the registry.
   
   For a kernel implementer, it's simpler to assume 
`NullHandling::INTERSECTION` implies pre-allocated validity bitmap buffer. No 
matter what arrays are passed as input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to