rtpsw commented on code in PR #14352:
URL: https://github.com/apache/arrow/pull/14352#discussion_r1027058005
##########
cpp/src/arrow/compute/exec/aggregate_node.cc:
##########
@@ -497,7 +662,14 @@ class GroupByNode : public ExecNode {
int64_t num_output_batches = bit_util::CeilDiv(out_data_.length,
output_batch_size());
outputs_[0]->InputFinished(this, static_cast<int>(num_output_batches));
- RETURN_NOT_OK(plan_->StartTaskGroup(output_task_group_id_,
num_output_batches));
+ if (is_last) {
+ RETURN_NOT_OK(plan_->StartTaskGroup(output_task_group_id_,
num_output_batches));
+ } else {
+ for (int64_t i = 0; i < num_output_batches; i++) {
+ OutputNthBatch(i);
Review Comment:
The intention is for the gated shared locks to hold the multi-threaded
aggregation of the next segment while the current aggregation is being finished
by a dedicated thread. In other words, each end-of-segment is a join-point. If
your review finds that the locks are more restrictive then I agree it should be
fixed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]