Dandandan opened a new pull request, #21716:
URL: https://github.com/apache/datafusion/pull/21716
## Which issue does this PR close?
- Closes #.
## Rationale for this change
Explores thread-per-core execution using `tokio-uring` in the benchmark
harness. Each worker runs its own `tokio_uring::start` runtime (a
current-thread Tokio reactor driven by `io_uring`), so Parquet I/O and decode
can be scheduled on the same core that will consume the bytes — avoiding
cross-thread hops that the default multi-threaded Tokio runtime introduces.
This is a demo / experiment intended to measure the impact of
thread-per-core + `io_uring` on ClickBench, not a production path.
## What changes are included in this PR?
Two commits:
1. **`bench: demonstrate tokio-uring thread-per-core execution`**
- `util::tokio_uring_pool` — spawns N OS threads, each running its own
`tokio_uring::start` runtime. `pool.spawn(|| async { ... })` ships a `Send`
closure to a round-robin worker, where it can build a possibly-`!Send` future
that runs locally on that worker's ring.
- `util::tokio_uring_store::TokioUringObjectStore` — local `ObjectStore`
whose `get_opts` / `get_ranges` drive reads through
`tokio_uring::fs::File::read_at`, dispatched across the pool. Writes / list /
copy / delete delegate to `LocalFileSystem`.
- ClickBench uses the pool by default on Linux (one worker per CPU).
Output partitions are `pool.spawn`-ed so plan execution itself happens on the
io_uring runtimes. Top-level `CoalescePartitionsExec` /
`SortPreservingMergeExec` are stripped so their N-partition children fan out
across workers; SPM merge is rebuilt over a `MemorySourceConfig` and
re-executed on one worker.
- Enabled via `--tokio-uring-workers N` (default:
`available_parallelism()` on Linux, disabled elsewhere; `0` forces the legacy
tokio MT path).
2. **`bench: keep tokio-uring reads local to the caller's worker`**
- Adds an `IN_WORKER` thread-local set by `run_worker`.
- Fast path in `TokioUringObjectStore::read_ranges_uring` uses
`tokio_uring::spawn` on the current ring when `in_worker()` is true, skipping
the round-robin mpsc + oneshot round-trip so bytes land on the same core that
will consume them.
- Round-robin path kept for callers on a non-uring runtime (planning on
the main Tokio MT).
All changes are confined to `benchmarks/` and Linux-only (`#[cfg(target_os =
"linux")]`). No changes to the DataFusion core crates.
## Are these changes tested?
Exercised via `dfbench clickbench` on Linux. No new unit tests — this is a
benchmark harness wiring experiment.
## Are there any user-facing changes?
No public API changes. New opt-in CLI flag `--tokio-uring-workers` on the
benchmark binary (Linux only).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]