ianmcook commented on code in PR #35:
URL: https://github.com/apache/arrow-experiments/pull/35#discussion_r1843885375
##########
http/get_compressed/README.md:
##########
@@ -20,3 +20,160 @@
# HTTP GET Arrow Data: Compression Examples
This directory contains examples of HTTP servers/clients that transmit/receive
data in the Arrow IPC streaming format and use compression (in various ways) to
reduce the size of the transmitted data.
+
+Since we re-use the [Arrow IPC format][ipc] for transferring Arrow data over
+HTTP and both Arrow IPC and HTTP standards support compression on their own,
+there are at least two approaches to this problem:
+
+1. Compressed HTTP responses carrying Arrow IPC streams with uncompressed
+ array buffers.
+2. Uncompressed HTTP responses carrying Arrow IPC streams with compressed
+ array buffers.
+
+Applying both IPC buffer and HTTP compression to the same data is not
+recommended. The extra CPU overhead of decompressing the data twice is
+not worth any possible gains that double compression might bring. If
+compression ratios are unambiguously more important than reducing CPU
+overhead, then a different compression algorithm that optimizes for that can
+be chosen.
+
+This table shows the support for different compression algorithms in HTTP and
+Arrow IPC:
+
+| Format | HTTP Support | IPC Support |
+| ------------------ | --------------- | --------------- |
+| gzip (GZip) | X | |
+| deflate (DEFLATE) | X | |
+| br (Brotli) | X[^2] | |
+| zstd (Zstandard) | X[^2] | X |
+| lz4 (LZ4) | | X |
+
+Since not all Arrow IPC implementations support compression, HTTP compression
+based on accepted formats negotiated with the client is a great way to increase
+the chances of efficient data transfer.
+
+Servers may check the `Accept-Encoding` header of the client and choose the
+compression format in this order of preference: `zstd`, `br`, `gzip`,
+`identity` (no compression). If the client does not specify a preference, the
+only constraint on the server is the availability of the compression algorithm
+in the server environment.
+
+## Arrow IPC Compression
+
+When IPC buffer compression is preferred and servers can't assume all clients
+support it[^3], clients may be asked to explicitly list the supported
compression
+algorithms in the request headers. The `Accept` header can be used for this
+since `Accept-Encoding` (and `Content-Encoding`) is used to control compression
+of the entire HTTP response stream and instruct HTTP clients (like browsers) to
+decompress the response before giving data to the application or saving the
+data.
+
+ Accept: application/vnd.apache.arrow.ipc; codecs="zstd, lz4"
+
+This is similar to clients requesting video streams by specifying the
+container format and the codecs they support
+(e.g. `Accept: video/webm; codecs="vp8, vorbis"`).
+
+The server is allowed to choose any of the listed codecs, or not compress the
+IPC buffers at all. Uncompressed IPC buffers should always be acceptable by
+clients.
+
+If a server adopts this approach and a client does not specify any codecs in
+the `Accept` header, the server can fall back to checking `Accept-Encoding`
+header to pick a compression algorithm for the entire HTTP response stream.
+
+To make debugging easier servers may include the chosen compression codec(s)
+in the `Content-Type` header of the response (quotes are optional):
+
+ Content-Type: application/vnd.apache.arrow.ipc; codecs=zstd
Review Comment:
```suggestion
Content-Type: application/vnd.apache.arrow.stream; codecs=zstd
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]