westonpace commented on code in PR #12755:
URL: https://github.com/apache/arrow/pull/12755#discussion_r849843032
##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -69,6 +70,26 @@ static void SimplifyFilterWithGuarantee(benchmark::State&
state, Expression filt
}
}
+static void ExecuteScalarExpressionOverhead(benchmark::State& state,
Expression expr) {
+ const auto rows_per_batch = static_cast<int32_t>(state.range(0));
+ const auto num_batches = 10000000 / rows_per_batch;
+
+ ExecContext ctx;
+ auto dataset_schema = schema({
+ field("x", int64()),
+ });
+ ExecBatch input({Datum(ConstantArrayGenerator::Int64(rows_per_batch, 5))},
+ /*length=*/1);
+
+ ASSIGN_OR_ABORT(auto bound, expr.Bind(*dataset_schema));
+ for (auto _ : state) {
+ for (int it = 0; it < num_batches; ++it)
+ ABORT_NOT_OK(ExecuteScalarExpression(bound, input, &ctx).status());
+ }
+ state.counters["rows_per_second"] = benchmark::Counter(
+ state.iterations() * num_batches * rows_per_batch,
benchmark::Counter::kIsRate);
Review Comment:
TL;DR: I thought this was "rows per second across all threads" but I did an
experiment and it is indeed "rows per second per thread". We should add
`->UseRealTime()` to all the benchmarks to get the rates to be "rows per second
across all threads" so that the numbers are more directly comparable.
-- Details for anyone interested --
Google's documentation here is very confusing. `kIsRate` is defined as
`Mark the counter as a rate. It will be presented divided by the duration of
the benchmark.` However, the definition of "duration of the benchmark" is
confusing and not (as one might assume) "the amount of time it takes for the
benchmark to run" but instead "the total amount of CPU time burned running the
benchmark".
For my experiment I created a benchmark that should busy wait for 0.1s
across 5 threads and 10 iterations. It takes 1s for the benchmark to run and I
naively assumed this was "duration of the benchmark":
```
static void BM_CounterTest(benchmark::State &state) {
int64_t counter = 0;
for (const auto &_ : state) {
auto start = std::chrono::high_resolution_clock::now();
while (true) {
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = now - start;
if (elapsed > std::chrono::milliseconds(100)) {
break;
}
counter++;
}
benchmark::DoNotOptimize(counter);
}
state.counters["custom_counter"] = benchmark::Counter(state.iterations());
state.counters["rate_counter"] =
benchmark::Counter(state.iterations(), benchmark::Counter::kIsRate);
}
BENCHMARK(BM_CounterTest)->Threads(5)->Iterations(10);
```
The results are:
```
-------------------------------------------------------------------------------------------
Benchmark Time
CPU Iterations
-------------------------------------------------------------------------------------------
BM_CounterTest/iterations:10/threads:5 20000052 ns 99996137 ns
50 custom_counter=50 rate_counter=10.0004/s
```
The first column `Time` is `iteration_duration / num_threads` (0.02s).
The second column `CPU` is `iteration_duration` (0.1s).
The counter is what we expect `num_iterations = 50`.
In order for `rate_counter` to be `10` our counter divisor must be `5s`
which is, indeed, "the total amount of CPU time burned across all threads".
Note that this really is "CPU" time. If the benchmark has sleeps the CPU time
will be very small and the rate will be very large and (IMHO) rather misleading.
If we change to wall clock time (`->UseRealTime()`) the duration becomes
"the time it takes for the benchmark to run":
```
BM_CounterTest/iterations:10/real_time/threads:5 20000049 ns 99997126
ns 50 custom_counter=50 rate_counter=49.9999/s
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]