Sorry for the months of delay, but I finally returned and caught up with all my other projects. I had time to look at this benchmark and unfortunately it raises more questions than it answers.
Frankly, this is just too hard to follow after spending an hour trying to understand it. You cannot expect someone to build four different repositories and copy around random jars just to run a benchmark! Ideally I could run mvn test and it would exercise the proposed jclouds async and the existing jclouds sync code as well as whatever setup. You could even embed this within your jclouds fork. Please do not ask me to use random jars checked into GitHub; instead provide instructions to mvn install from the required trees. You should present your benchmarks in the context of jclouds, not S3Proxy. The latter clouds the former with whatever overhead it introduces and its thread management, potentially the cause of your latency. Further, while I maintain both jclouds and S3Proxy, each have different considerations in terms of merging code. For example, we might merge a limited async support in jclouds without using it in S3Proxy, at least initially. Regarding benchmarking, surely we expect async code to outperform sync code with a limited number of threads. How does this change with 100 or even 1000 threads? Threads only cost 512 KB stack space so we should not fear using many on modern hardware. The bandwidth measurements confuse me since previously I have saturated 1 Gbit with only a few synchronous threads. The latency measurements confuse me since requests would just pile up in the sync case since if there are more clients than jetty threads. Looking through the commit history, I would need to read 27 commits just to understand the jclouds changes. This includes commented out code and rewrites of code which are hard to follow. I recommend rewriting the history before I look again, hopefully separated into the logical units and not one big commit. It would help to have some kind of high-level explanation of what the changes do. I want you to understand the impact adding a new portable API has on jclouds which requires cross-cloud support. Are you willing to contribute GCS and S3 support as well as your proposed Azure support (so-called rule of three)? Putting my S3Proxy hat on briefly, I minimize the number of provider-specific call sites so you should consider how a single provider without async support complicates the code. Backing up, you should explain what you want to accomplish with asynchronous APIs. Do you expect to have 100, 1000, or more concurrent requests? Is your primary concern performance or some other limitation? Supporting your request, the AWS SDK just added support for async: https://aws.amazon.com/blogs/aws/now-available-developer-preview-of-aws-sdk-for-java-2-0/ On Mon, May 29, 2017 at 07:36:56AM +0000, Battula Kishore wrote: > Hi Andrew, > > Thanks andrew for the quick response. Here is the GitHub repo and > instructions on how to run the tests > https://github.com/kishore25kumar/s3proxy-async-test-setup. The READE.md also > has the repo details of s3proxy as well as jclouds implementation. Hope this > helps? Let me know if you need anything else? > > In the mean time you review these results if you can let me know the design > review process I can be prepared for that. > > -- Thanks > -- Kishore > > > > > > > > > > On 26/05/17, 12:40 PM, "Andrew Gaul" <g...@apache.org> wrote: > > >Kishore, these are promising results! I reformatted the most important > >rows which show a 2x improvement in throughput and latency: > > > >10 10,000 Async Http Lib 209 282 48 > >10 10,000 OutputStream 392 542 25 > > > >Can you share the implementation and include instructions on how to > >replicate these tests? > > > >On Wed, May 24, 2017 at 05:50:52AM +0000, Battula Kishore wrote: > >> Hi, > >> > >> This is Kishore who is working on async poc using mail > >> id(kishore25ku...@gmail.com<mailto:kishore25ku...@gmail.com>). I work at > >> adobe and we wanted to implement async support for jclouds library and > >> contribute it back. > >> > >> From the last discussion I was asked to get the performance numbers for > >> the two approaches. > >> Approach 1: Using Http Async Library > >> Approach 2: Using Outputstream > >> > >> Test setup: > >> > >> 1. Both the s3 proxy server and test runner are running in same > >> Docker container in azure west-us region. > >> > >> 2. Azure storage account is also residing in same west-us region. > >> > >> 3. A bucket is prepopulated with 100,000 files, each file of 1 MB > >> size before test start. > >> > >> 4. The test runner sends unique requests to s3proxy to download > >> files. > >> > >> Virtual Machine spec: CPU - 8 cores, Memory - 28 GB (Standard_D4 Azure > >> machine) > >> > >> S3proxy is running with 1 jetty worker thread in all the scenarios. The > >> payload size used is 1 MB file. Here are the performance numbers. > >> Test Runner Threads > >> > >> Iteration Per thread > >> > >> Approach > >> > >> Avg response time (ms) > >> > >> 99%tile time (ms) > >> > >> Throughput > >> (Requests / sec) > >> > >> 1 > >> > >> 10,000 > >> > >> Async Http Lib > >> > >> 45 > >> > >> 87 > >> > >> 22 > >> > >> 5 > >> > >> 10,000 > >> > >> Async Http Lib > >> > >> 107 > >> > >> 159 > >> > >> 47 > >> > >> 10 > >> > >> 10,000 > >> > >> Async Http Lib > >> > >> 209 > >> > >> 282 > >> > >> 48 > >> > >> 1 > >> > >> 10,000 > >> > >> OutputStream > >> > >> 41 > >> > >> 85 > >> > >> 24 > >> > >> 5 > >> > >> 10,000 > >> > >> OutputStream > >> > >> 190 > >> > >> 283 > >> > >> 26 > >> > >> 10 > >> > >> 10,000 > >> > >> OutputStream > >> > >> 392 > >> > >> 542 > >> > >> 25 > >> > >> > >> Summary: Under load Http Async Library approach is providing more > >> throughput compared to Output stream approach. > >> > >> > >> Both the approaches improve performance. The output stream approach can be > >> used along with Http Async library approach which is giving around (3-5 > >> ms) improvement in latency. > >> > >> Each approach is independent development. At this point I am keen to take > >> up Http Async Library development. > >> > >> > >> -- Thanks > >> -- Kishore > >> > > > >-- > >Andrew Gaul > >https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgaul.org%2F&data=02%7C01%7C%7C8690ee3e00bc4e3da0e608d4a406565e%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636313794511467457&sdata=XQ%2FshVjdqC3KiVEuyH6%2FJvmDN5DHBmS0kIBx98V89KY%3D&reserved=0 -- Andrew Gaul http://gaul.org/