felipecrv commented on PR #35: URL: https://github.com/apache/arrow-experiments/pull/35#issuecomment-2354319850
> * Using Arrow IPC buffer compression introduces less decompression latency than using HTTP compression. There might be a Python overhead in these HTTP compression examples because the buffer compression happens completely inside the C++ layer and the HTTP examples connect different pyarrow classes. This is still a merit of the IPC buffer compression since Python might be present on both client and server. Buffer compression is really beneficial to the IPC stream parser. The numbers above look very good. > * If it's not an option to use Arrow IPC buffer compression (e.g. because it's not implemented in the Arrow library you're using), then: > > * If the network is very fast and data transfer costs are not a concern at all, don't use any HTTP compression. I would recommend `zstd`. The network has to be very fast and reliable for `zstd` to not be helpful. > * If the network is fairly fast and data transfer costs are not a major concern, zstd is often the best all-around balanced option (but YMMV so try it yourself in your real-world environment on a representative sample of datasets). I would emphasize the `zstd` recommendation more. Almost no network is as reliable as the loopback interface at 127.0.0.1 :D > * If the network is slower or data transfer costs are a major concern, try experimenting with other HTTP compression codecs. Indeed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org