eeroel commented on issue #38664:
URL: https://github.com/apache/arrow/issues/38664#issuecomment-1807048863

   Did some benchmarking against AWS CLI (`aws s3 cp`):
   - AWS CLI also connected to only one IP address so that's probably OK, and 
not bottlenecking at these rates. I also tested setting 
https://curl.se/libcurl/c/CURLOPT_DNS_SHUFFLE_ADDRESSES.html but didn't see any 
performance improvement although connections were made with different IPs.
   - AWS CLI is 15-20% faster on my computer so there could be some room for 
optimization in the pre-buffer / cache parameters, but it's not a major 
difference. Interestingly, the CLI downloads the file mostly in 9MB or 18MB 
chunks, with the 9MB chunks at half the rate compared to the 18MB ones.
   - I mentioned above that Colab took 15s to download the file, but this must 
have been an instance outside of the US, on another instance I get 2-4s 
download times (both with AWS CLI and pyarrow 13/14)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to