ZhangHuiGui commented on code in PR #41975:
URL: https://github.com/apache/arrow/pull/41975#discussion_r1639153037
##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -1034,9 +1034,23 @@ class VectorExecutor : public
KernelExecutorImpl<VectorKernel> {
output_num_buffers_ =
static_cast<int>(output_type_.type->layout().buffers.size());
// Decide if we need to preallocate memory for this kernel
- validity_preallocated_ =
- (kernel_->null_handling != NullHandling::COMPUTED_NO_PREALLOCATE &&
- kernel_->null_handling != NullHandling::OUTPUT_NOT_NULL);
+ validity_preallocated_ = false;
+ if (output_type_.type->id() != Type::NA) {
+ if (kernel_->null_handling == NullHandling::COMPUTED_PREALLOCATE) {
+ // Override the flag if kernel asks for pre-allocation
+ validity_preallocated_ = true;
+ } else if (kernel_->null_handling == NullHandling::INTERSECTION) {
+ bool elide_validity_bitmap = true;
+ for (const auto& arg : batch.values) {
+ auto null_gen = NullGeneralization::Get(arg) ==
NullGeneralization::ALL_VALID;
+
+ // If not all valid, this becomes false
+ elide_validity_bitmap = elide_validity_bitmap && null_gen;
+ }
+ validity_preallocated_ = !elide_validity_bitmap;
+ }
+ }
Review Comment:
> follow a single code path that uses the pre-allocated buffer to put the
result of the bitmaps intersection
Are you talking about `PropagateNulls`? Currently, seems this logic is used
in `INTERSECTION`.
If not, i understand that if the kernel internal execute path must `follow a
single code path that uses the pre-allocated buffer`, `COMPUTED_PREALLOCATE`
can be set statically, which should be safer for this kernel?
Back to preallocation-validity buffer, if kernel-function sets
`NullHandling::INTERSECTION`, does it mean that kernel-function may not be sure
whether it needs a pre-allocated validity buffer and let the Executor decide
for itself? At this time, the Executor can only dynamically determine whether
to set the pre-allocate buffer based on the input data.
> And even if that is not the case (assuming all kernels are perfect at the
moment), this extra logic introduces branching in the possible states of
pre-allocated buffers based on runtime input data and not something statically
defined in the `kernel_` when it's configured.
From a performance perspective, the current logic does introduce many
branches at runtime for `INTERSECTION`, compared to `PropagateNulls`, it
actually passes `NullGeneralization::Get` introduces very few branches.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]