Re: [DISCUSS] Protocol for exchanging Arrow data over REST APIs

Ian Cook Sat, 18 Nov 2023 10:52:12 -0800

Hi Kou,

I think it is too early to make a specific proposal. I hope to use this
discussion to collect more information about existing approaches. If
several viable approaches emerge from this discussion, then I think we
should make a document listing them, like you suggest.


Thank you for the information about Groonga. This type of straightforward
HTTP-based approach would work in the context of a REST API, as I
understand it.

But how is the performance? Have you measured the throughput of this
approach to see if it is comparable to using Flight SQL? Is this approach
able to saturate a fast network connection?

And what about the case in which the server wants to begin sending batches
to the client before the total number of result batches / records is known?
Would this approach work in that case? I think so but I am not sure.

If this HTTP-based type of approach is sufficiently performant and it works
in a sufficient proportion of the envisioned use cases, then perhaps the
proposed spec / protocol could be based on this approach. If so, then we
could refocus this discussion on which best practices to incorporate /
recommend, such as:
- server should not return the result data in the body of a response to a
query request; instead server should return a response body that gives
URI(s) at which clients can GET the result data
- transmit result data in chunks (Transfer-Encoding: chunked), with
recommendations about chunk size
- support range requests (Accept-Range: bytes) to allow clients to request
result ranges (or not?)
- recommendations about compression
- recommendations about TCP receive window size
- recommendation to open multiple TCP connections on very fast networks
(e.g. >25 Gbps) where a CPU thread could be the throughput bottleneck

On the other hand, if the performance and functionality of this HTTP-based
type of approach is not sufficient, then we might consider fundamentally
different approaches.

Ian

Re: [DISCUSS] Protocol for exchanging Arrow data over REST APIs

Reply via email to