Hi,

Could you provide a draft of the specification/protocol for
easy to discuss? Or can we list existing approaches (and
their pros/cons)?

I'm not sure what range we want to cover by the
specification/protocol.


I'm developing Groonga, that is a HTTP based (but not REST)
full text search server, and it uses the following protocol:

* If a client wants to receive a response as Apache Arrow
  data, it uses http://.../COMMAND.arrows URL.
  * "arrows" is the standard extension for the Apache Arrow
    streaming format we've registered to IANA.
    * https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format
    * 
https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream
* If a server needs to send a response as Apache Arrow data:
  * It uses "Content-Type: application/vnd.apache.arrow.stream"
    HTTP header.
    * The media type is the standard media type for the
      Apache Arrow streaming format we've registered to IANA.
  * HTTP body is the Apache Arrow streaming format data.
  * It sends multiple data that use the Apache Arrow
    streaming format because it needs to send multiple data
    with different schema.
    * A client reads them by creating multiple record batch
      stream readers. Python example:
      
https://github.com/hhatto/poyonga/blob/b8c2a2ba9fdbb8250d1d3d4db64137a9781fb7b7/poyonga/result.py#L97-L101


Thanks,
-- 
kou

In <cana9gthar5jyerlpoakh6ovgwu25apjh6+q4dlq9t15m9_3...@mail.gmail.com>
  "[DISCUSS] Protocol for exchanging Arrow data over REST APIs" on Fri, 17 Nov 
2023 16:13:39 -0500,
  Ian Cook <ianmc...@apache.org> wrote:

> Several recent discussions have highlighted the lack of an established
> specification / protocol for sending Arrow-formatted data through REST
> APIs. I would like to start a discussion here to gauge interest and gather
> ideas about this.
> 
> For background:
> 
> Flight RPC provides a framework for building RPC APIs that exchange
> Arrow-formatted data. The two salient facts about Flight RPC are:
> 
> (1) It uses the Arrow format as its data serialization format.
> (2) It is an RPC framework, built on gRPC, with HTTP/2 as the transfer
> protocol.
> 
> Both of these design choices were made to optimize performance. But over
> time, we've seen that much of the performance benefit can be achieved with
> (1) alone. We've seen examples of REST services that exchange
> Arrow-formatted data with HTTP/1.1 as the transfer protocol and manage to
> achieve very good performance. Often this is the most viable approach,
> particularly in the case where there's a requirement to build on top of an
> existing REST API instead of building a new RPC API.
> 
> But since there is no standard protocol for implementing exchange of
> Arrow-formatted data in REST services, we see different REST APIs
> implementing this in different ways. The implementations are bespoke and
> incompatible, they might be designed sub-optimally, and developer time is
> wasted writing custom code in different languages / libraries to exchange
> Arrow-formatted data across these different REST APIs.
> 
> I think it would make sense for the Arrow project to establish a standard
> protocol for this. I believe this would accelerate adoption of Arrow as a
> format for exchanging data across REST APIs. It would increase convenience
> and compatibility and reduce implementation complexity.
> 
> Compared to Flight RPC or Flight SQL, this protocol would be much smaller
> in scope. It could consist only of a specification for how to implement
> support for exchanging Arrow-formatted data in an existing REST API.
> Services that implemented this would wrap it in their own REST APIs. This
> protocol specification would be concerned only with the subset of the API
> that involved sending/receiving Arrow formatted data.
> 
> Input appreciated from anyone in the community who might be interested in
> using this or contributing to this.
> 
> Ian

Reply via email to