zanmato1984 commented on code in PR #44053:
URL: https://github.com/apache/arrow/pull/44053#discussion_r1763387381
##########
cpp/src/arrow/acero/aggregate_benchmark.cc:
##########
@@ -866,5 +887,61 @@
BENCHMARK(TDigestKernelDoubleMedian)->Apply(QuantileKernelArgs);
BENCHMARK(TDigestKernelDoubleDeciles)->Apply(QuantileKernelArgs);
BENCHMARK(TDigestKernelDoubleCentiles)->Apply(QuantileKernelArgs);
+//
+// Segmented Aggregate
+//
+
+static void BenchmarkSegmentedAggregate(
+ benchmark::State& state, int64_t num_rows, std::vector<Aggregate>
aggregates,
+ const std::vector<std::shared_ptr<Array>>& arguments,
+ const std::vector<std::shared_ptr<Array>>& keys, int64_t num_segment_keys,
+ int64_t num_segments) {
+ ASSERT_GT(num_segments, 0);
+
+ auto rng = random::RandomArrayGenerator(42);
+ auto segment_key = rng.Int64(num_rows, /*min=*/0, /*max=*/num_segments - 1);
+ int64_t* values = segment_key->data()->GetMutableValues<int64_t>(1);
+ std::sort(values, values + num_rows);
+ // num_segment_keys copies of the segment key.
+ ArrayVector segment_keys(num_segment_keys, segment_key);
Review Comment:
Ah, I see your point. It is only that I want to make the number of segments
to be exactly as specified. Combining independently-random keys, even with the
same distribution, will make the number of segments (potentially, much) bigger.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]