niebayes opened a new issue, #7248:
URL: https://github.com/apache/arrow-rs/issues/7248

   We are building a database using the Arrow Flight SQL protocol, and we 
recently observed an interesting phenomenon. When we create a stream on the 
server side, it is immediately polled by the server, rather than waiting for 
the client to start polling it. 
   After a lengthy debugging process, we discovered that this behavior is 
triggered by Tonic. The Arrow Flight SQL protocol is based on gRPC, and in the 
arrow-rs project, we use Tonic to build the gRPC service. When we create a 
cross-process stream through Tonic, Tonic actually creates a unidirectional 
stream. At the same time, Tonic spawns a task to continuously poll this stream 
as soon as it is created, even if the client has not started polling it yet. 
   We believe this behavior does not align with the design of the Arrow Flight 
SQL protocol. Typically, a query goes through two phases: `get_flight_info` and 
`do_get_statement` . In the first phase, we create the stream, and in the 
second phase, we retrieve the stream from the server and consume it on the 
client side. We expect the stream to start being consumed only in the second 
phase, but in reality, this is not the case. 
   On the other hand, our database implements distributed query based on 
DataFusion. The frontend sends the execution plan to other nodes, the backend 
executes it and returns a stream along with the statistics. The frontend then 
optimizes the plan. For example, a `count(*)` query would be optimized to 
directly return the `num_rows` stored in the statistics. As a result, the 
backend-side stream can be simply discarded. However, tonic would eagerly 
consume the stream without waiting for the client's poll, which is actually not 
necessary.
   We hope that arrow-rs can provide a way to lazily consume the stream created 
on the server side, meaning it only starts being consumed when the client polls 
it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to