pepijnve commented on issue #16482: URL: https://github.com/apache/datafusion/issues/16482#issuecomment-2991955349
I had a look at this already while working on #16398 but got stuck on figuring out how to actually test this. There's a lot going on in the combination of SpillManager and SPM that make it difficult to be sure you're testing the actual task budget consumption aspect. `SpillManager::read_spill_as_stream` is the function where the cooperative decorator was added #16398. This stream immediately gets wrapped in `RecordBatchReceiverStream` created by `spawn_buffered`. The consequence is that from the point of view of the consumer of the result of `SpillManager::read_spill_as_stream` the stream is already cooperative since `RecordBatchReceiverStream` uses a tokio mpsc channel. So what're really trying to test is that the inner spawned task in `spawn_buffered` that sends to the channel yields every now and then. Since the buffer of the channel is bounded in size, this is likely to happen naturally already. So long story short... I think we might be able to test this by - Writing a test focused on `spawn_buffered` with a sufficiently large buffer size - Set up a very fast consumer of the `RecordBatchReceiverStream` so that the buffer always has room - Use an artificially slow (i.e. do a blocking sleep or something like that in poll_next), always ready producer stream With that combination I think you could in theory have another case of stuck task at https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/common.rs#L108 That would manifest itself as the runtime not being able to cleanly shut down because a deeply hidden inner spawned task refuses to abort. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org