Hi, Could you provide a draft of the specification/protocol for easy to discuss? Or can we list existing approaches (and their pros/cons)?
I'm not sure what range we want to cover by the specification/protocol. I'm developing Groonga, that is a HTTP based (but not REST) full text search server, and it uses the following protocol: * If a client wants to receive a response as Apache Arrow data, it uses http://.../COMMAND.arrows URL. * "arrows" is the standard extension for the Apache Arrow streaming format we've registered to IANA. * https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format * https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream * If a server needs to send a response as Apache Arrow data: * It uses "Content-Type: application/vnd.apache.arrow.stream" HTTP header. * The media type is the standard media type for the Apache Arrow streaming format we've registered to IANA. * HTTP body is the Apache Arrow streaming format data. * It sends multiple data that use the Apache Arrow streaming format because it needs to send multiple data with different schema. * A client reads them by creating multiple record batch stream readers. Python example: https://github.com/hhatto/poyonga/blob/b8c2a2ba9fdbb8250d1d3d4db64137a9781fb7b7/poyonga/result.py#L97-L101 Thanks, -- kou In <cana9gthar5jyerlpoakh6ovgwu25apjh6+q4dlq9t15m9_3...@mail.gmail.com> "[DISCUSS] Protocol for exchanging Arrow data over REST APIs" on Fri, 17 Nov 2023 16:13:39 -0500, Ian Cook <ianmc...@apache.org> wrote: > Several recent discussions have highlighted the lack of an established > specification / protocol for sending Arrow-formatted data through REST > APIs. I would like to start a discussion here to gauge interest and gather > ideas about this. > > For background: > > Flight RPC provides a framework for building RPC APIs that exchange > Arrow-formatted data. The two salient facts about Flight RPC are: > > (1) It uses the Arrow format as its data serialization format. > (2) It is an RPC framework, built on gRPC, with HTTP/2 as the transfer > protocol. > > Both of these design choices were made to optimize performance. But over > time, we've seen that much of the performance benefit can be achieved with > (1) alone. We've seen examples of REST services that exchange > Arrow-formatted data with HTTP/1.1 as the transfer protocol and manage to > achieve very good performance. Often this is the most viable approach, > particularly in the case where there's a requirement to build on top of an > existing REST API instead of building a new RPC API. > > But since there is no standard protocol for implementing exchange of > Arrow-formatted data in REST services, we see different REST APIs > implementing this in different ways. The implementations are bespoke and > incompatible, they might be designed sub-optimally, and developer time is > wasted writing custom code in different languages / libraries to exchange > Arrow-formatted data across these different REST APIs. > > I think it would make sense for the Arrow project to establish a standard > protocol for this. I believe this would accelerate adoption of Arrow as a > format for exchanging data across REST APIs. It would increase convenience > and compatibility and reduce implementation complexity. > > Compared to Flight RPC or Flight SQL, this protocol would be much smaller > in scope. It could consist only of a specification for how to implement > support for exchanging Arrow-formatted data in an existing REST API. > Services that implemented this would wrap it in their own REST APIs. This > protocol specification would be concerned only with the subset of the API > that involved sending/receiving Arrow formatted data. > > Input appreciated from anyone in the community who might be interested in > using this or contributing to this. > > Ian