felipecrv commented on PR #35:
URL: https://github.com/apache/arrow-experiments/pull/35#issuecomment-2354319850

   > * Using Arrow IPC buffer compression introduces less decompression latency 
than using HTTP compression.
   
   There might be a Python overhead in these HTTP compression examples because 
the buffer compression happens completely inside the C++ layer and the HTTP 
examples connect different pyarrow classes. This is still a merit of the IPC 
buffer compression since Python might be present on both client and server.
   
   Buffer compression is really beneficial to the IPC stream parser. The 
numbers above look very good.
   
   > * If it's not an option to use Arrow IPC buffer compression (e.g. because 
it's not implemented in the Arrow library you're using), then:
   >   
   >   * If the network is very fast and data transfer costs are not a concern 
at all, don't use any HTTP compression.
   
   I would recommend `zstd`. The network has to be very fast and reliable for 
`zstd` to not be helpful.
   
   >   * If the network is fairly fast and data transfer costs are not a major 
concern, zstd is often the best all-around balanced option (but YMMV so try it 
yourself in your real-world environment on a representative sample of datasets).
   
   I would emphasize the `zstd` recommendation more. Almost no network is as 
reliable as the loopback interface at 127.0.0.1 :D
   
   >   * If the network is slower or data transfer costs are a major concern, 
try experimenting with other HTTP compression codecs.
   
   Indeed.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to