wjones127 opened a new issue, #104:
URL: https://github.com/apache/arrow-rs-object-store/issues/104

   **Describe the bug**
   
   I've been investigating object store performance, and am finding regardless 
of the number of concurrent requests I make, I only get 250 MB/s thoughput. 
However, I can get much higher throughput if I create a new object store for 
every request. This seems to indicate some sort of throughput limit per object 
store.
   
   I'm not sure, but it could be limited by the HTTP2 connection somehow.
   
   Also, I'm not sure this is limited to GCS. I've just been testing with that 
store right now. It might also apply to S3.
   
   **To Reproduce**
   You can see the test script I've been using here:
   
   
https://github.com/wjones127/object-store-bench/blob/7ce73db2aaf4c7077cdfc20aa4e0c3cd0e93c59c/src/download.rs
   
   If you change the `fetch_range_len()` to re-create the object store on each 
invocation, you will see much faster download speeds.
   
   
   Concurrent Tasks | Single instance | Re-create instance | Single instance | 
Re-create instance
   -- | -- | -- | -- | --
     | Time (s) | Time (s) | MB/s | MB/s
   1 | 5.05 | 5.43 | 197.9 | 184.2
   5 | 3.96 | 2.02 | 252.2 | 494
   10 | 3.76 | 1.67 | 265.7 | 597.2
   20 | 4.24 | 1.26 | 235.7 | 793.8
   
   **Expected behavior**
   
   In a lot of programs we use `Arc<&dyn ObjectStore>`, and we'd like to use 
them across multiple threads and get throughput that scales with the number of 
concurrent calls. If there is some underlying limit we can remove, that would 
be ideal.
   
   However, if there is some fundamental limit, then at the very least we 
should (1) document it and (2) provide a way to easily clone a `Box<&dyn 
ObjectStore>` into a totally independent instance for use in concurrent 
requests.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to