pepijnve commented on PR #16322:
URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2958304164

   Waiting for benchmarks results here so I have some time to write up my 
assessment of what was happening and what has changed. This is just to assist 
any reviewers, not to replace review or justify the changes.
   
   #### Initial test failure
   
   The congestion test set up three streams to merge:
   1. returns Ready(None) and panics if polled again. This tries to check that 
None is respected as terminal result
   2. return Pending until stream 3 has been polled, then always returns 
Ready(None)
   3. clears congestion for 2 and always returns Ready(None)
   
   After the initial change the congestion test started failing. The poll 
sequence I observed was
   1. stream1.poll_next -> None
   2. stream2.poll_next -> Pending
   3. stream3.poll_next -> Ready(None)
   
   and then the test hung. This turned out to be caused by CongestedStream 
returning pending, but not holding on to the waker to wake it when the 
congestion clears. The initial phase of SPM was now assuming it would be woken 
up by any stream that returned pending and that wasn't happening.
   
   #### CongestedStream fix
   
   I don't believe it's valid for a Stream to return pending but not set up 
waking, so I fixed that and then got this poll sequence.
   
   1. stream1.poll_next -> None
   2. stream2.poll_next -> Pending
   3. stream3.poll_next -> Ready(None)
   4. stream2.poll_next -> Ready(None)
   
   and the test passed.
   
   #### Using swap_remove
   
   Based on initial review feedback I restored the usage of swap_remove which 
results in the following sequence.
   
   1. stream1.poll_next -> None
   2. stream3.poll_next -> Ready(None)
   3. stream2.poll_next -> Ready(None)
   
   Due to swap_remove changing the poll order, stream 3 was getting polled 
before stream 2. As a consequence we never hit the congestion situation and the 
test case passes, but no longer tests what it says it does.
   
   #### Testing congestion again
   
   I adapted CongestedStream to keep track of the set of streams that have been 
polled. Both stream 2 and 3 are now treated as congested unless all streams 
have been polled. This results in the following sequence.
   
   1. stream1.poll_next -> None
   2. stream3.poll_next -> Pending
   3. stream2.poll_next -> Ready(None)
   4. stream3.poll_next -> Ready(None)
   
   The test case now hits the congestion point again. As a bonus the test is no 
longer dependent on the exact order in which the streams are polled which 
decouples it from the implementation details a bit more.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to