felipecrv commented on code in PR #41700:
URL: https://github.com/apache/arrow/pull/41700#discussion_r1723524869
##########
cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc:
##########
@@ -395,10 +526,45 @@ struct FixedWidthTakeImpl {
out_arr->null_count = out_arr->length - valid_count;
return Status::OK();
}
+
+ static Status ChunkedExec(KernelContext* ctx, const ChunkedArray& values,
+ const ArraySpan& indices, ArrayData* out_arr,
+ int64_t factor) {
+ const bool out_has_validity = values.null_count() > 0 ||
indices.MayHaveNulls();
+
+ ChunkedFixedWidthValuesSpan chunked_values{values};
+ ResolvedIndicesState resolved_idx;
+ RETURN_NOT_OK(resolved_idx.InitWithIndices<IndexCType>(
+ /*chunks=*/values.chunks(), /*idx_length=*/indices.length,
+ /*idx=*/indices.GetValues<IndexCType>(1), ctx->memory_pool()));
+
+ int64_t valid_count = 0;
+ arrow::internal::GatherFromChunks<kValueWidthInBits, IndexCType,
WithFactor::value>
Review Comment:
This is the kind of optimization that depends on statistical properties of
the input data so gains are hard to quantify. Here I went with the assumption
that if there are multiple chunks, values are gathered from multiple of them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]