[GitHub] [arrow] pitrou commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

GitBox Tue, 01 Jun 2021 01:20:41 -0700


pitrou commented on a change in pull request #10421:
URL: https://github.com/apache/arrow/pull/10421#discussion_r642886129




##########
File path: cpp/src/arrow/util/thread_pool.h
##########
@@ -288,6 +288,10 @@ class ARROW_EXPORT ThreadPool : public Executor {
   // tasks are finished.
   Status Shutdown(bool wait = true);
 
+  // Waits for the thread pool to reach a quiet state where all workers are

Review comment:
       "Wait"

##########
File path: cpp/src/arrow/util/thread_pool_benchmark.cc
##########
@@ -103,6 +103,52 @@ static void ThreadPoolSpawn(benchmark::State& state) {  // 
NOLINT non-const refe
   state.SetItemsProcessed(state.iterations() * nspawns);
 }
 
+// The ThreadPoolSpawn benchmark submits all tasks from a single outside 
thread.  This
+// ends up causing a worst-case scenario for the current simple thread pool.  
All threads
+// compete over the task queue mutex trying to grab the next thread off the 
queue and the
+// result is a large amount of contention.
+//
+// By spreading out the scheduling across multiple threads we can help reduce 
that
+// contention.  This benchmark demonstrates the ideal case where we are able 
to perfectly
+// partition the scheduling across the available threads.
+//
+// Both situations could be encountered (the thread pool can't choose how it 
is used) but
+// by having both benchmarks we can express the importance of distributed 
scheduling.
+static void ThreadPoolIdealSpawn(benchmark::State& state) {  // NOLINT 
non-const reference
+  const auto nthreads = static_cast<int>(state.range(0));
+  const auto workload_size = static_cast<int32_t>(state.range(1));
+
+  Workload workload(workload_size);
+
+  // Spawn enough tasks to make the pool start up overhead negligible
+  const int32_t nspawns = 200000000 / workload_size + 1;
+  const int32_t nspawns_per_thread = nspawns / nthreads;
+
+  for (auto _ : state) {
+    state.PauseTiming();
+    std::shared_ptr<ThreadPool> pool;
+    pool = *ThreadPool::Make(nthreads);
+    state.ResumeTiming();
+
+    for (int32_t i = 0; i < nthreads; ++i) {
+      // Pass the task by reference to avoid copying it around
+      ABORT_NOT_OK(pool->Spawn([&pool, &workload, nspawns_per_thread] {
+        for (int32_t j = 0; j < nspawns_per_thread; j++) {
+          ABORT_NOT_OK(pool->Spawn(std::ref(workload)));
+        }
+      }));
+    }
+
+    // Wait for all tasks to finish
+    pool->WaitForIdle();

Review comment:
       What's the point, since you're calling `Shutdown(wait=true)` just below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] pitrou commented on a change in pull request #10421: ARROW-12903: [C++] Create new thread pool benchmark demonstrating the "scheduling" bottleneck

Reply via email to