Several recent discussions have highlighted the lack of an established
specification / protocol for sending Arrow-formatted data through REST
APIs. I would like to start a discussion here to gauge interest and gather
ideas about this.

For background:

Flight RPC provides a framework for building RPC APIs that exchange
Arrow-formatted data. The two salient facts about Flight RPC are:

(1) It uses the Arrow format as its data serialization format.
(2) It is an RPC framework, built on gRPC, with HTTP/2 as the transfer
protocol.

Both of these design choices were made to optimize performance. But over
time, we've seen that much of the performance benefit can be achieved with
(1) alone. We've seen examples of REST services that exchange
Arrow-formatted data with HTTP/1.1 as the transfer protocol and manage to
achieve very good performance. Often this is the most viable approach,
particularly in the case where there's a requirement to build on top of an
existing REST API instead of building a new RPC API.

But since there is no standard protocol for implementing exchange of
Arrow-formatted data in REST services, we see different REST APIs
implementing this in different ways. The implementations are bespoke and
incompatible, they might be designed sub-optimally, and developer time is
wasted writing custom code in different languages / libraries to exchange
Arrow-formatted data across these different REST APIs.

I think it would make sense for the Arrow project to establish a standard
protocol for this. I believe this would accelerate adoption of Arrow as a
format for exchanging data across REST APIs. It would increase convenience
and compatibility and reduce implementation complexity.

Compared to Flight RPC or Flight SQL, this protocol would be much smaller
in scope. It could consist only of a specification for how to implement
support for exchanging Arrow-formatted data in an existing REST API.
Services that implemented this would wrap it in their own REST APIs. This
protocol specification would be concerned only with the subset of the API
that involved sending/receiving Arrow formatted data.

Input appreciated from anyone in the community who might be interested in
using this or contributing to this.

Ian

Reply via email to