rtpsw commented on PR #35874:
URL: https://github.com/apache/arrow/pull/35874#issuecomment-1573862020

   > Thinking about loud here:
   > 
   > What we want to test is that if the through put of asof join node is 
slower than the source, then we would pause the source. Two potential ways that 
I think we can reliably do this: (1) Add some sort of "debug options" to 
manipulate the behavior of asof join to make it run slower. (i.e. Sleep a few 
seconds before actually starting the work in the processing thread) (2) Add a 
downstream node to asof join that processes data slowly (similar to a slow data 
sink), i.e., process one batch per second. This way, the backpressure would be 
pushed from the slow sink to asof join then to the data sources.
   > 
   > I think I prefer (2) a bit more because this affects represents a real 
life case of slow sink.
   > 
   > @westonpace I am not sure if the idea of GatedSourceNode is similar or 
different, but happy to hear
   
   While I'm not sure exactly what Weston has in mind, my understanding is that 
the GatedSourceNode's goal is to avoid flakiness due to non-deterministic 
timing. IMO, both (1) and (2) above could be flaky due to non-deterministic 
timing.
   
   Between (1) and (2) I also wouldn't prefer (1) because the debug-options 
would change the behavior of the as-of-join node being tested, and I prefer to 
change the code driving it instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to