rtpsw commented on code in PR #34311:
URL: https://github.com/apache/arrow/pull/34311#discussion_r1125649916


##########
cpp/src/arrow/compute/exec/aggregate_node.cc:
##########
@@ -440,12 +613,29 @@ class GroupByNode : public ExecNode, public TracedNode {
       int key_field_id = key_field_ids[i];
       output_fields[base + i] = input_schema->field(key_field_id);
     }
+    base += keys.size();
+    for (size_t i = 0; i < segment_keys.size(); ++i) {
+      int segment_key_field_id = segment_key_field_ids[i];
+      output_fields[base + i] = input_schema->field(segment_key_field_id);
+    }
 
     return input->plan()->EmplaceNode<GroupByNode>(
         input, schema(std::move(output_fields)), std::move(key_field_ids),
+        std::move(segment_key_field_ids), std::move(segmenter), 
std::move(agg_src_types),
         std::move(agg_src_fieldsets), std::move(aggs), std::move(agg_kernels));
   }
 
+  Status ResetAggregates() {

Review Comment:
   `ScalarAggregateNode` can be seen as a no-keys optimized version of 
`GroupByNode`. This PR adds support for segment-keys to both these nodes. Note 
that pre-PR both nodes outputted a single batch, where `ScalarAggregateNode`'s 
had a single row and `GroupByNode`'s had one row per group, whereas post-PR 
both output multiple batches of the same structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to