zagto commented on code in PR #13138:
URL: https://github.com/apache/arrow/pull/13138#discussion_r886730951
##########
cpp/src/arrow/compute/exec.cc:
##########
@@ -308,16 +308,30 @@ bool ExecBatchIterator::Next(ExecBatch* batch) {
// Now, fill the batch
batch->values.resize(args_.size());
batch->length = iteration_size;
- for (size_t i = 0; i < args_.size(); ++i) {
- if (args_[i].is_scalar()) {
- batch->values[i] = args_[i].scalar();
- } else if (args_[i].is_array()) {
- batch->values[i] = args_[i].array()->Slice(position_, iteration_size);
- } else {
- const ChunkedArray& carr = *args_[i].chunked_array();
- const auto& chunk = carr.chunk(chunk_indexes_[i]);
- batch->values[i] = chunk->data()->Slice(chunk_positions_[i],
iteration_size);
- chunk_positions_[i] += iteration_size;
+
+ if (iteration_size == length_) {
+ ARROW_DCHECK_EQ(position_, 0);
+ for (size_t i = 0; i < args_.size(); ++i) {
+ if (args_[i].kind() == Datum::CHUNKED_ARRAY) {
+ const ChunkedArray& carr = *args_[i].chunked_array();
+ batch->values[i] = Datum(carr.chunk(chunk_indexes_[i])->data());
+ chunk_positions_[i] += iteration_size;
+ } else {
+ batch->values[i] = std::move(args_[i]);
+ }
Review Comment:
I think you're right about not having to update `chunk_positions_[i]`. But
the suggested change wouldn't work. Moving the chunked array Datum directly
into batch->values means it ends up getting passed to the kernel, which usually
expects an ARRAY type Datum. The important part why we need a branch here is
extracting the ArrayData from the chunked array
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]