[PR] bench: demonstrate tokio-uring thread-per-core execution in dfbench [datafusion]

via GitHub Sat, 18 Apr 2026 05:25:28 -0700


Dandandan opened a new pull request, #21716:
URL: https://github.com/apache/datafusion/pull/21716


   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   Explores thread-per-core execution using `tokio-uring` in the benchmark 
harness. Each worker runs its own `tokio_uring::start` runtime (a 
current-thread Tokio reactor driven by `io_uring`), so Parquet I/O and decode 
can be scheduled on the same core that will consume the bytes — avoiding 
cross-thread hops that the default multi-threaded Tokio runtime introduces.
   
   This is a demo / experiment intended to measure the impact of 
thread-per-core + `io_uring` on ClickBench, not a production path.
   
   ## What changes are included in this PR?
   
   Two commits:
   
   1. **`bench: demonstrate tokio-uring thread-per-core execution`**
      - `util::tokio_uring_pool` — spawns N OS threads, each running its own 
`tokio_uring::start` runtime. `pool.spawn(|| async { ... })` ships a `Send` 
closure to a round-robin worker, where it can build a possibly-`!Send` future 
that runs locally on that worker's ring.
      - `util::tokio_uring_store::TokioUringObjectStore` — local `ObjectStore` 
whose `get_opts` / `get_ranges` drive reads through 
`tokio_uring::fs::File::read_at`, dispatched across the pool. Writes / list / 
copy / delete delegate to `LocalFileSystem`.
      - ClickBench uses the pool by default on Linux (one worker per CPU). 
Output partitions are `pool.spawn`-ed so plan execution itself happens on the 
io_uring runtimes. Top-level `CoalescePartitionsExec` / 
`SortPreservingMergeExec` are stripped so their N-partition children fan out 
across workers; SPM merge is rebuilt over a `MemorySourceConfig` and 
re-executed on one worker.
      - Enabled via `--tokio-uring-workers N` (default: 
`available_parallelism()` on Linux, disabled elsewhere; `0` forces the legacy 
tokio MT path).
   
   2. **`bench: keep tokio-uring reads local to the caller's worker`**
      - Adds an `IN_WORKER` thread-local set by `run_worker`.
      - Fast path in `TokioUringObjectStore::read_ranges_uring` uses 
`tokio_uring::spawn` on the current ring when `in_worker()` is true, skipping 
the round-robin mpsc + oneshot round-trip so bytes land on the same core that 
will consume them.
      - Round-robin path kept for callers on a non-uring runtime (planning on 
the main Tokio MT).
   
   All changes are confined to `benchmarks/` and Linux-only (`#[cfg(target_os = 
"linux")]`). No changes to the DataFusion core crates.
   
   ## Are these changes tested?
   
   Exercised via `dfbench clickbench` on Linux. No new unit tests — this is a 
benchmark harness wiring experiment.
   
   ## Are there any user-facing changes?
   
   No public API changes. New opt-in CLI flag `--tokio-uring-workers` on the 
benchmark binary (Linux only).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] bench: demonstrate tokio-uring thread-per-core execution in dfbench [datafusion]

Reply via email to