[GitHub] [arrow] westonpace commented on a diff in pull request #12755: ARROW-16014: [C++] Create more benchmarks for measuring expression evaluation overhead

GitBox Wed, 13 Apr 2022 12:56:10 -0700


westonpace commented on code in PR #12755:
URL: https://github.com/apache/arrow/pull/12755#discussion_r849843032



##########
cpp/src/arrow/compute/exec/expression_benchmark.cc:
##########
@@ -69,6 +70,26 @@ static void SimplifyFilterWithGuarantee(benchmark::State& 
state, Expression filt
   }
 }
 
+static void ExecuteScalarExpressionOverhead(benchmark::State& state, 
Expression expr) {
+  const auto rows_per_batch = static_cast<int32_t>(state.range(0));
+  const auto num_batches = 10000000 / rows_per_batch;
+
+  ExecContext ctx;
+  auto dataset_schema = schema({
+      field("x", int64()),
+  });
+  ExecBatch input({Datum(ConstantArrayGenerator::Int64(rows_per_batch, 5))},
+                  /*length=*/1);
+
+  ASSIGN_OR_ABORT(auto bound, expr.Bind(*dataset_schema));
+  for (auto _ : state) {
+    for (int it = 0; it < num_batches; ++it)
+      ABORT_NOT_OK(ExecuteScalarExpression(bound, input, &ctx).status());
+  }
+  state.counters["rows_per_second"] = benchmark::Counter(
+      state.iterations() * num_batches * rows_per_batch, 
benchmark::Counter::kIsRate);

Review Comment:
   TL;DR: I thought this was "rows per second across all threads" but I did an 
experiment and it is indeed "rows per second per thread".  We should add 
`->UseRealTime()` to all the benchmarks to get the rates to be "rows per second 
across all threads" so that the numbers are more directly comparable.
   
   -- Details for anyone interested --
   
   Google's documentation here is very confusing.  `kIsRate` is defined as 
`Mark the counter as a rate. It will be presented divided by the duration of 
the benchmark.`  However, the definition of "duration of the benchmark" is 
confusing and not (as one might assume) "the amount of time it takes for the 
benchmark to run" but instead "the total amount of CPU time burned running the 
benchmark".
   
   For my experiment I created a benchmark that should busy wait for 0.1s 
across 5 threads and 10 iterations.  It takes 1s for the benchmark to run and I 
naively assumed this was "duration of the benchmark":
   
   ```
   static void BM_CounterTest(benchmark::State &state) {
     int64_t counter = 0;
     for (const auto &_ : state) {
       auto start = std::chrono::high_resolution_clock::now();
       while (true) {
         auto now = std::chrono::high_resolution_clock::now();
         auto elapsed = now - start;
         if (elapsed > std::chrono::milliseconds(100)) {
           break;
         }
         counter++;
       }
       benchmark::DoNotOptimize(counter);
     }
     state.counters["custom_counter"] = benchmark::Counter(state.iterations());
     state.counters["rate_counter"] =
         benchmark::Counter(state.iterations(), benchmark::Counter::kIsRate);
   }
   BENCHMARK(BM_CounterTest)->Threads(5)->Iterations(10);
   ```
   
   The results are:
   
   ```
   
-------------------------------------------------------------------------------------------
   Benchmark                                                 Time             
CPU   Iterations
   
-------------------------------------------------------------------------------------------
   BM_CounterTest/iterations:10/threads:5      20000052 ns     99996137 ns      
     50 custom_counter=50 rate_counter=10.0004/s
   ```
   The first column `Time` is `iteration_duration / num_threads` (0.02s).
   The second column `CPU` is `iteration_duration` (0.1s).
   The counter is what we expect `num_iterations = 50`.
   
   In order for `rate_counter` to be `10` our counter divisor must be `5s` 
which is, indeed, "the total amount of CPU time burned across all threads".  
Note that this really is "CPU" time.  If the benchmark has sleeps the CPU time 
will be very small and the rate will be very large and (IMHO) rather misleading.
   
   If we change to wall clock time (`->UseRealTime()`) the duration becomes 
"the time it takes for the benchmark to run":
   
   ```
   BM_CounterTest/iterations:10/real_time/threads:5   20000049 ns     99997126 
ns           50 custom_counter=50 rate_counter=49.9999/s
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on a diff in pull request #12755: ARROW-16014: [C++] Create more benchmarks for measuring expression evaluation overhead

Reply via email to