[PR] Remove waits from blocking threads reading spill files. [datafusion]

via GitHub Thu, 10 Apr 2025 14:28:36 -0700


ashdnazg opened a new pull request, #15654:
URL: https://github.com/apache/datafusion/pull/15654


   Based on #15653, will be rebased when that PR is merged.
   
   ## Which issue does this PR close?
   
   - Closes https://github.com/apache/datafusion/issues/15323.
   
   ## Rationale for this change
   
   The previous design of reading spill files was a `push` design, spawning
   long lived blocking tasks which repeatedly read records, send them and
   wait until they are received. This design had an issue where progress
   wasn't guaranteed (i.e., there was a deadlock) if there were more spill
   files than the blocking thread pool in tokio which were all waited for
   together.
   
   To solve this, the design is changed to a `pull` design, where blocking
   tasks are spawned for every read, removing waiting on the IO threads and
   guaranteeing progress.
   
   While there might be an added overhead for repeatedly calling
   `spawn_blocking`, it's probably insignificant compared to the IO cost of
   reading from the disk.
   
   ## Are these changes tested?
   
   The existing tests all pass.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[PR] Remove waits from blocking threads reading spill files. [datafusion]

Reply via email to