rtpsw commented on code in PR #14352:
URL: https://github.com/apache/arrow/pull/14352#discussion_r1027058005


##########
cpp/src/arrow/compute/exec/aggregate_node.cc:
##########
@@ -497,7 +662,14 @@ class GroupByNode : public ExecNode {
 
     int64_t num_output_batches = bit_util::CeilDiv(out_data_.length, 
output_batch_size());
     outputs_[0]->InputFinished(this, static_cast<int>(num_output_batches));
-    RETURN_NOT_OK(plan_->StartTaskGroup(output_task_group_id_, 
num_output_batches));
+    if (is_last) {
+      RETURN_NOT_OK(plan_->StartTaskGroup(output_task_group_id_, 
num_output_batches));
+    } else {
+      for (int64_t i = 0; i < num_output_batches; i++) {
+        OutputNthBatch(i);

Review Comment:
   The intention is for the gated shared locks to hold the multi-threaded 
aggregation of the next segment while the current aggregation is being finished 
by a dedicated thread. In other words, each end-of-segment is a join-point. If 
your review finds that the locks are more restrictive then I agree it should be 
fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to