mrocklin commented on issue #38389: URL: https://github.com/apache/arrow/issues/38389#issuecomment-1779469247
For S3, we've found that 2-3x numcpus is pretty good. One get about 50MB/s per S3 connection, and total aggregate S3 bandwidth on Amazon is correlated with machine size (larger machines with more cores have more bandwidth). This scales linearly for modestly sized machines, such that 2-3x ends up being a good general rule. This is made more explicit at the top of the notebook I shared (using more threads with raw S3 access results in greater aggregate bandwidth). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
