Dandandan commented on PR #576: URL: https://github.com/apache/arrow-ballista/pull/576#issuecomment-1372458204
Hey @yahoNanJing - you're right, it's a trade-off between CPU and bandwidth memory. Ideally we also need to do this only once (in the shuffle writer), and only need to stream the files from disk instead of re-encoding (and recompressing) them to IPC. This would both save network and disk IO (write+read) while paying the CPU overhead only once! LZ4 is much more light on the CPU than ZSTD, so would be great if we could configure this too! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
