Re: Question about pyarrow.substrait.run_query

Weston Pace Wed, 12 Oct 2022 22:05:59 -0700

1. Yes.
2. I was going to say yes but...on closer examination...it appears
that it is not applying backpressure.

The SinkNode accumulates batches in a queue and applies backpressure.
I thought we were using a sink node since it is the normal "accumulate
batches into a queue" sink.  However, the Substrait<->Python
integration is not using a sink node but instead a custom
ConsumingSinkNode (SubstraitSinkConsumer).  The SubstraitSinkConsumer
does accumulate batches in a queue (just like the sink node) but it is
not handling backpressure.  I've created [1] to track this.

[1] https://issues.apache.org/jira/browse/ARROW-18025

On Wed, Oct 12, 2022 at 9:02 AM Li Jin <ice.xell...@gmail.com> wrote:
>
> Hello!
>
> I have some questions about how "pyarrow.substrait.run_query" works.
>
> Currently run_query returns a record batch reader. Since Acero is a
> push-based model and the reader is pull-based, I'd assume the reader object
> somehow accumulates the batches that are pushed to it. And I wonder
>
> (1) Does the output batches keep accumulating in the reader object, until
> someone reads from the reader?
> (2) Are there any back pressure mechanisms implemented to prevent OOM if
> data doesn't get pulled from the reader? (Bounded cache in the reader
> object?)
>
> Thanks,
> Li

Re: Question about pyarrow.substrait.run_query

Reply via email to