Re: [PR] GH-39565: [C++] Do not concatenate chunked values of fixed-width types to run "array_take" [arrow]

via GitHub Tue, 20 Aug 2024 08:31:38 -0700


felipecrv commented on code in PR #41700:
URL: https://github.com/apache/arrow/pull/41700#discussion_r1723524869



##########
cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc:
##########
@@ -395,10 +526,45 @@ struct FixedWidthTakeImpl {
     out_arr->null_count = out_arr->length - valid_count;
     return Status::OK();
   }
+
+  static Status ChunkedExec(KernelContext* ctx, const ChunkedArray& values,
+                            const ArraySpan& indices, ArrayData* out_arr,
+                            int64_t factor) {
+    const bool out_has_validity = values.null_count() > 0 || 
indices.MayHaveNulls();
+
+    ChunkedFixedWidthValuesSpan chunked_values{values};
+    ResolvedIndicesState resolved_idx;
+    RETURN_NOT_OK(resolved_idx.InitWithIndices<IndexCType>(
+        /*chunks=*/values.chunks(), /*idx_length=*/indices.length,
+        /*idx=*/indices.GetValues<IndexCType>(1), ctx->memory_pool()));
+
+    int64_t valid_count = 0;
+    arrow::internal::GatherFromChunks<kValueWidthInBits, IndexCType, 
WithFactor::value>

Review Comment:
   This is the kind of optimization that depends on statistical properties of 
the input data so gains are hard to quantify. Here I went with the assumption 
that if there are multiple chunks, values are gathered from multiple of them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-39565: [C++] Do not concatenate chunked values of fixed-width types to run "array_take" [arrow]

Reply via email to