icexelloss commented on code in PR #34311:
URL: https://github.com/apache/arrow/pull/34311#discussion_r1132489205
##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -199,21 +199,39 @@ class ARROW_EXPORT ProjectNodeOptions : public
ExecNodeOptions {
std::vector<std::string> names;
};
-/// \brief Make a node which aggregates input batches, optionally grouped by
keys.
+/// \brief Make a node which aggregates input batches, optionally grouped by
keys and
+/// optionally segmented by segment-keys. Both keys and segment-keys determine
the group.
+/// However segment-keys are also used for determining grouping segments,
which should be
+/// large, and allow streaming a partial aggregation result after processing
each segment.
+/// One common use-case for segment-keys is ordered aggregation, in which the
segment-key
+/// attribute specifies a column with non-decreasing values or a
lexicographically-ordered
+/// set of such columns.
///
/// If the keys attribute is a non-empty vector, then each aggregate in
`aggregates` is
/// expected to be a HashAggregate function. If the keys attribute is an empty
vector,
/// then each aggregate is assumed to be a ScalarAggregate function.
+///
+/// If the segment_keys attribute is a non-empty vector, then segmented
aggregation, as
+/// described above, applies.
+///
+/// The keys and segment_keys vectors must be disjoint.
+///
+/// See also doc in `aggregate_node.cc`
Review Comment:
@rtpsw Looks like you missed out on this comment
##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -199,21 +199,39 @@ class ARROW_EXPORT ProjectNodeOptions : public
ExecNodeOptions {
std::vector<std::string> names;
};
-/// \brief Make a node which aggregates input batches, optionally grouped by
keys.
+/// \brief Make a node which aggregates input batches, optionally grouped by
keys and
+/// optionally segmented by segment-keys. Both keys and segment-keys determine
the group.
+/// However segment-keys are also used for determining grouping segments,
which should be
+/// large, and allow streaming a partial aggregation result after processing
each segment.
+/// One common use-case for segment-keys is ordered aggregation, in which the
segment-key
+/// attribute specifies a column with non-decreasing values or a
lexicographically-ordered
+/// set of such columns.
///
/// If the keys attribute is a non-empty vector, then each aggregate in
`aggregates` is
/// expected to be a HashAggregate function. If the keys attribute is an empty
vector,
/// then each aggregate is assumed to be a ScalarAggregate function.
+///
+/// If the segment_keys attribute is a non-empty vector, then segmented
aggregation, as
+/// described above, applies.
+///
+/// The keys and segment_keys vectors must be disjoint.
+///
+/// See also doc in `aggregate_node.cc`
Review Comment:
@rtpsw Looks like you missed out on this comment (minor issue)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]