rtpsw commented on PR #35874: URL: https://github.com/apache/arrow/pull/35874#issuecomment-1573862020
> Thinking about loud here: > > What we want to test is that if the through put of asof join node is slower than the source, then we would pause the source. Two potential ways that I think we can reliably do this: (1) Add some sort of "debug options" to manipulate the behavior of asof join to make it run slower. (i.e. Sleep a few seconds before actually starting the work in the processing thread) (2) Add a downstream node to asof join that processes data slowly (similar to a slow data sink), i.e., process one batch per second. This way, the backpressure would be pushed from the slow sink to asof join then to the data sources. > > I think I prefer (2) a bit more because this affects represents a real life case of slow sink. > > @westonpace I am not sure if the idea of GatedSourceNode is similar or different, but happy to hear While I'm not sure exactly what Weston has in mind, my understanding is that the GatedSourceNode's goal is to avoid flakiness due to non-deterministic timing. IMO, both (1) and (2) above could be flaky due to non-deterministic timing. Between (1) and (2) I also wouldn't prefer (1) because the debug-options would change the behavior of the as-of-join node being tested, and I prefer to change the code driving it instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
