1. Yes. 2. I was going to say yes but...on closer examination...it appears that it is not applying backpressure.
The SinkNode accumulates batches in a queue and applies backpressure. I thought we were using a sink node since it is the normal "accumulate batches into a queue" sink. However, the Substrait<->Python integration is not using a sink node but instead a custom ConsumingSinkNode (SubstraitSinkConsumer). The SubstraitSinkConsumer does accumulate batches in a queue (just like the sink node) but it is not handling backpressure. I've created [1] to track this. [1] https://issues.apache.org/jira/browse/ARROW-18025 On Wed, Oct 12, 2022 at 9:02 AM Li Jin <ice.xell...@gmail.com> wrote: > > Hello! > > I have some questions about how "pyarrow.substrait.run_query" works. > > Currently run_query returns a record batch reader. Since Acero is a > push-based model and the reader is pull-based, I'd assume the reader object > somehow accumulates the batches that are pushed to it. And I wonder > > (1) Does the output batches keep accumulating in the reader object, until > someone reads from the reader? > (2) Are there any back pressure mechanisms implemented to prevent OOM if > data doesn't get pulled from the reader? (Bounded cache in the reader > object?) > > Thanks, > Li