geoffreyclaude opened a new issue, #2601:
URL: https://github.com/apache/iceberg-rust/issues/2601

   ### Is your feature request related to a problem or challenge?
   
   Iceberg has a `Runtime` abstraction with separate handles for IO-bound and 
CPU-bound work. However, `FileIO` and the storage layer do not currently route 
storage operations through that runtime.
   
   This makes it difficult for users who intentionally separate Tokio runtimes 
for CPU work and IO work. Even when a catalog or table is constructed with a 
runtime, storage operations such as metadata file reads and data-file 
byte-range reads may still execute on the caller's current runtime.
   
   This is especially relevant for DataFusion scans. The Parquet decode, 
filtering, projection, and batch transformation work should remain on the 
runtime that polls the returned `RecordBatchStream`, but storage byte-range 
reads should be able to run through `runtime.io()`.
   
   In other words, runtime routing should happen at the storage boundary, not 
by moving the whole scan stream onto an IO runtime.
   
   ### Describe the solution you'd like
   
   Add runtime-aware storage routing under `FileIO`.
   
   A possible shape:
   
   - allow `FileIO` / `FileIOBuilder` to receive an Iceberg `Runtime`, or an IO 
runtime handle;
   - keep concrete storage backends runtime-agnostic;
   - wrap the raw `Storage` implementation in a private runtime-aware adapter 
when an IO runtime is configured;
   - wrap returned `FileRead` / `FileWrite` objects so delayed range reads, 
writes, and close operations also route through the IO runtime;
   - ensure tables built with a configured `Runtime` also bind their `FileIO` 
to that runtime;
   - add DataFusion runtime-aware constructors so catalog-backed providers can 
propagate the runtime to loaded tables.
   
   The expected behavior would be:
   
   ```text
   DataFusion / caller runtime
     |
     +-- poll Iceberg scan stream
     |     --> Parquet decode / filtering / projection
     |
     +-- Iceberg FileIO
           |
           v
         runtime.io()
           |
           +-- metadata file reads
           +-- FileRead::read(range)
           +-- FileWrite::{write, close}
   ```
   
   Non-goals:
   
   - changing scan partitioning;
   - adding eager file planning;
   - changing DataFusion physical-plan shape;
   - moving all Iceberg metadata processing onto a CPU runtime.
   
   Testing ideas:
   
   - verify `FileIO::exists` routes through the configured IO runtime;
   - verify `InputFile::reader` and later `FileRead::read(range)` both route 
through the IO runtime;
   - verify `OutputFile::writer`, `FileWrite::write`, and `FileWrite::close` 
route through the IO runtime;
   - verify DataFusion catalog-backed table construction can propagate a 
runtime;
   - verify existing memory/local filesystem behavior remains unchanged without 
a configured runtime.
   
   Disclosure: this issue text was drafted with assistance from Codex and 
reviewed before filing.
   
   ### Willingness to contribute
   
   I can contribute to this feature independently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to