Hi,

Ivan and I are debugging some behavior of the source node this morning and
I was hoping to clarify that our understanding is correct.

We observed that when using source node with a generator:
https://github.com/apache/arrow/blob/66c66d040bbf81a4819b276aee306625dc02837c/cpp/src/arrow/compute/exec/options.h#L54

The source node becomes "sequential" (batches come out in order one at a
time) even with a GetCpuThreadPool() attached to the plan.

We traced the code into this class:
https://github.com/apache/arrow/blob/78fb2edd30b602bd54702896fa78d36ec6fefc8c/cpp/src/arrow/util/async_generator.h#L316

And it seems like because of the synchronization of this class, it
generates batches sequentially. Is this correct understanding and if it is
intentional that the source node are sequential when backed by a generator?
(This is actually the behavior that we want)

Reply via email to