martin-g commented on code in PR #20067:
URL: https://github.com/apache/datafusion/pull/20067#discussion_r2797178101


##########
datafusion/physical-plan/src/spill/spill_pool.rs:
##########
@@ -1441,4 +1441,44 @@ mod tests {
 
         Ok(())
     }
+    
+    #[tokio::test(flavor = "multi_thread", worker_threads = 1)]
+    async fn test_concurrent_writer_reader_race_condition() -> Result<()> {
+        // stress testing the concurncy in the reader and the reader to make 
sure there is now race condtion

Review Comment:
   ```suggestion
           // stress testing the concurrency in the reader and the writer to 
make sure there is now race condition
   ```



##########
datafusion/physical-plan/src/spill/spill_manager.rs:
##########


Review Comment:
   Now, `read_spill_as_stream()` is exactly the same as 
`read_spill_as_stream_unbuffered()` below.
   Maybe the buffering impl should have been preserved but the caller with the 
issue should have been changed to use `read_spill_as_stream_unbuffered()` ?!



##########
datafusion/physical-plan/src/spill/spill_manager.rs:
##########


Review Comment:
   The more I dig into this, the more I think there should be a better solution.
   The PR solves the issue by removing the pre-fetching of spilled data.
   IMO we should focus on finding the reason why the pre-fetching gets the 
wrong EOF (and drops the reader) and fix it.
   https://github.com/apache/datafusion/issues/20027#issuecomment-3818140067



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to