[PR] Fix single-threaded bottleneck in parquet file stream processing [iceberg-rust]

via GitHub Wed, 17 Sep 2025 07:26:54 -0700


vustef opened a new pull request, #1684:
URL: https://github.com/apache/iceberg-rust/pull/1684


   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   - Closes #.
   
   ## What changes are included in this PR?
   
   ### Problem
   
   The iceberg-rust arrow reader had a critical performance bottleneck where
   CPU-intensive parquet operations (decompression, decoding) were running
   on a single thread, despite multiple files being processed concurrently.
   
   **Root cause**: While `try_buffer_unordered()` enabled concurrent file
   processing, `try_flatten_unordered()` was polling each file's
   `ArrowRecordBatchStream` sequentially on the same thread. This meant that
   the CPU-heavy `ParquetRecordBatchStream::poll_next()` operations - which
   perform parquet decompression and decoding - were all bottlenecked on a
   single thread.
   
   **Impact**: Multi-core systems were severely underutilized when processing
   multiple parquet files, as evidenced by flamegraphs showing
   single-threaded execution in the critical parquet processing path.
   
   ### Solution
   
   Modified the arrow reader to ensure each file's parquet stream processing
   runs on dedicated tokio tasks:
   
   1. Added `stream_to_receiver()` method: Wraps each `ArrowRecordBatchStream`
   in a `tokio::spawn()` task that handles the CPU-intensive polling. 
**Consideration**: perhaps we should use `spawn_blocking`
   2. Updated `process_file_scan_task()`: Each file's stream is now wrapped
   with `stream_to_receiver()` before being returned
   3. Preserved existing concurrency patterns: Uses the same
   `try_buffer_unordered()` + `try_flatten_unordered()` flow, but ensures the
   underlying polling scales across threads
   
   ### How it fixes the issue
   
   - Before: All parquet streams polled sequentially → single-threaded CPU
   bottleneck. TPCH lineitem scan on SF100 on r7i.8xlarge EC2 was taking 56s to 
scan.
   Network throughput:
   <img width="1044" height="278" alt="image" 
src="https://github.com/user-attachments/assets/20dfd1d2-6205-4ca5-9f3b-1fb5ca5c60a7";
 />
   
   
   - After: Each parquet stream polled in its own tokio::spawn() task → true
    multi-threaded CPU utilization. TPCH lineitem scan now takes 20s.
   Network throughput:
   <img width="855" height="337" alt="image" 
src="https://github.com/user-attachments/assets/8c775d45-27e5-43a1-83f4-c7d963896187";
 />
   
   
   The key insight was that try_flatten_unordered() only provides async
   concurrency for the futures that create streams, but the actual stream
   polling (where the CPU work happens) was still sequential. By moving each
   stream's polling to its own tokio task via stream_to_receiver(), we
   ensure that multiple files' CPU-intensive parquet operations can run
   simultaneously on different threads.
   
   **Additional considerations**:
   [ ] Should we use `spawn_blocking` in `stream_to_receiver`?
   [ ] Should we add a new configuration method for ScanBuilder to be able to 
limit this new concurrency? Or is it fine that it's the same as 
`concurrency_limit_data_files`?
   
   ## Are these changes tested?
   
   <!--
   Specify what test covers (unit test, integration test, etc.).
   
   If tests are not included in your PR, please explain why (for example, are 
they covered by existing tests)?
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] Fix single-threaded bottleneck in parquet file stream processing [iceberg-rust]

Reply via email to