niebayes opened a new issue, #7248: URL: https://github.com/apache/arrow-rs/issues/7248
We are building a database using the Arrow Flight SQL protocol, and we recently observed an interesting phenomenon. When we create a stream on the server side, it is immediately polled by the server, rather than waiting for the client to start polling it. After a lengthy debugging process, we discovered that this behavior is triggered by Tonic. The Arrow Flight SQL protocol is based on gRPC, and in the arrow-rs project, we use Tonic to build the gRPC service. When we create a cross-process stream through Tonic, Tonic actually creates a unidirectional stream. At the same time, Tonic spawns a task to continuously poll this stream as soon as it is created, even if the client has not started polling it yet. We believe this behavior does not align with the design of the Arrow Flight SQL protocol. Typically, a query goes through two phases: `get_flight_info` and `do_get_statement` . In the first phase, we create the stream, and in the second phase, we retrieve the stream from the server and consume it on the client side. We expect the stream to start being consumed only in the second phase, but in reality, this is not the case. On the other hand, our database implements distributed query based on DataFusion. The frontend sends the execution plan to other nodes, the backend executes it and returns a stream along with the statistics. The frontend then optimizes the plan. For example, a `count(*)` query would be optimized to directly return the `num_rows` stored in the statistics. As a result, the backend-side stream can be simply discarded. However, tonic would eagerly consume the stream without waiting for the client's poll, which is actually not necessary. We hope that arrow-rs can provide a way to lazily consume the stream created on the server side, meaning it only starts being consumed when the client polls it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org