felipecrv commented on code in PR #35:
URL: https://github.com/apache/arrow-experiments/pull/35#discussion_r1767794396


##########
http/get_compressed/README.md:
##########
@@ -20,3 +20,144 @@
 # HTTP GET Arrow Data: Compression Examples
 
 This directory contains examples of HTTP servers/clients that transmit/receive 
data in the Arrow IPC streaming format and use compression (in various ways) to 
reduce the size of the transmitted data.
+
+Since we re-use the [Arrow IPC format][ipc] for transferring Arrow data over
+HTTP and both Arrow IPC and HTTP standards support compression on their own,
+there are at least two approaches to this problem:
+
+1. Compressed HTTP responses carrying Arrow IPC streams with uncompressed
+   array buffers.
+2. Uncompressed HTTP responses carrying Arrow IPC streams with compressed
+   array buffers.
+
+Applying IPC buffer and HTTP compression at the same is not recommended. The
+extra CPU overhead of decompressing the data twice is not worth any possible
+gains that double compression might bring. If compression ratios are
+unambiguously more important than reducing CPU overhead, then a different
+compression algorithm that optimizes for that can be chosen.
+
+This table shows the support for different compression algorithms in HTTP and
+Arrow IPC:
+
+| Format             | HTTP Support    | IPC Support     |
+| ------------------ | --------------- | --------------- |
+| gzip (GZip)        | X               |                 |
+| deflate (DEFLATE)  | X               |                 |
+| br (Brotli)        | X[^2]           |                 |
+| zstd (Zstandard)   | X[^2]           | X               |
+| lz4 (LZ4)          |                 | X               |
+
+Since not all Arrow IPC implementations support compression, HTTP compression
+based on accepted formats negotiated with the client is a great way to increase
+the chances of efficient data transfer.
+
+Servers may check the `Accept-Encoding` header of the client and choose the
+compression format in this order of preference: `zstd`, `br`, `gzip`,
+`identity` (no compression). If the client does not specify a preference, the
+only constraint on the server is the availability of the compression algorithm
+in the server environment.
+
+## Arrow IPC Compression
+
+When IPC buffer compression is preferred and servers can't assume all clients
+support it[^3], clients may be asked to explicitly list the supported 
compression
+algorithms in the request headers. The `Accept` header can be used for this
+since `Accept-Encoding` (and `Content-Encoding`) is used to control compression
+of the entire HTTP response stream and instruct HTTP clients (like browsers) to
+decompress the response before giving data to the application or saving the
+data.
+
+    Accept: application/vnd.apache.arrow.ipc; codecs="zstd, lz4"

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to