Dandandan opened a new pull request, #21673:
URL: https://github.com/apache/datafusion/pull/21673

   ## Which issue does this PR close?
   
   - N/A (new feature)
   
   ## Rationale for this change
   
   Local file reads in DataFusion currently go through 
`object_store::local::LocalFileSystem`, which uses synchronous `pread()` calls 
(or `spawn_blocking` wrappers). For workloads that read many byte ranges from 
local files — particularly Parquet column chunks — this results in one syscall 
per range.
   
   Linux's `io_uring` interface allows batching multiple read operations into a 
single `io_uring_enter()` syscall, reducing kernel transitions and enabling the 
kernel to optimize I/O scheduling.
   
   ## What changes are included in this PR?
   
   Introduces a new `datafusion-object-store-iouring` crate that provides 
`IoUringObjectStore`:
   
   - **Dedicated io_uring worker thread** with a 256-entry submission queue
   - **Channel-based communication**: unbounded mpsc for requests, oneshot for 
responses
   - **Batched reads**: `get_ranges()` submits all byte ranges as SQEs in a 
single `io_uring_enter()` call
   - **Chunked submission**: handles cases where ranges exceed ring capacity
   - **Fallback**: on non-Linux platforms, all operations delegate to 
`LocalFileSystem`
   - **Feature flag**: `io-uring` on `datafusion-execution` to opt in via 
`DefaultObjectStoreRegistry`
   
   Architecture:
   ```
   Tokio async tasks                    Dedicated io_uring thread
   ─────────────────                    ────────────────────────
                                        IoUring (256 entries)
     send(ReadRanges) ──► mpsc ──►      submit SQEs (batch)
                                        submit_and_wait()
     await oneshot    ◄── oneshot ◄──   collect CQEs, send Bytes
   ```
   
   Write, list, copy, and delete operations are delegated to `LocalFileSystem`.
   
   ## Are these changes tested?
   
   Yes — 6 unit tests covering:
   - `put` + `get` round-trip
   - Single range reads
   - Multi-range batch reads
   - Head (metadata) requests
   - List operations
   - Empty range edge case
   
   Tests run on the non-Linux fallback path (macOS CI) and validate the 
ObjectStore contract. The io_uring code path would need Linux CI to exercise 
fully.
   
   ## Are there any user-facing changes?
   
   New optional crate `datafusion-object-store-iouring` and feature flag 
`io-uring` on `datafusion-execution`. No changes to default behavior.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to