Hi, Thanks for getting in touch with this!
When you ran the tests, what level of load did you start with? i.e. for 50 req/s, did you start at 0 req/s and then jump to 50, or did you ramp up the load in steps? I ask because the load monitor calculations learn the appropriate token rate based on the latency of requests. The load monitor will admit a request if there are available tokens in its token bucket (at the cost of one token). It also replenishes the number of tokens based on what the token rate is, and how long it's been since the token bucket was last replenished. Every 20 requests, it gets the latency of the requests (as a smoothed mean), and compares this to the target latency (100ms). If it is less, then the token rate is increased. How much it's increased by depends on how far below the average latency is to the target latency. This means that if you're going from a cold start, then there is a slow start where the load monitor ramps up the token replacement rate (as there is a limit as to how fast the token rate can rise). If you are starting from a cold start, can you please try seeing what happens when you run at the higher load for longer than 20 secs? If you're not, or you don't see any improvement when running the test for a couple of minutes, then can you let me know what latencies you're seeing on the requests? We've not exposed the default bucket size as a configuration option. I've raised an issue in cpp-common to do this (see https://github.com/Metaswitch/cpp-common/issues/199). If you want to change it yourself, then it's set at https://github.com/Metaswitch/sprout/blob/dev/sprout/main.cpp#L170. Hope this helps, Ellie -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Sushant Hiray Sent: 16 November 2014 11:45 To: [email protected] Cc: Anmol Garg Subject: [Clearwater] Subpar results while benchmarking Clearwater Dear All, As a part of our project we were trying to benchmark our Project Clearwater setup. Our current setup is one node for each component (i.e no clustering whatsoever) We installed this using the Manual Install Method from the docs. Each VM has 2GB RAM and runs on Ubuntu 12.04 LTS Experimental Setup: We modified the UCT IMS client source code, to send requests according to the rate specified. In the experiment, we are sending requests from the client to Bono at the rate of 10 req/s, 20 req/s, 30 req/s, 40 req/s, 50 req/s We ran each of our experiment for 20 seconds. So essentially sent req at 10 req/s for 20 seconds and so on and obtained the results. Here are some results which we obtained after benchmarking these requests: Throughput Graph <https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Throughtput.png> , Message Exchange: Client-Bono <https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Client-Bono.png>, Message exchange: Bono-Sprout <https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Bono-Sprout.png> , Message Exchange: Sprout-Homestead <https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Sprout-Hs.png> As you can see from the Throughput graph, the bottleneck is somewhere between 10-20 req/s which is pretty much small. We as well tracked the CPU Usage which was around 4-5% for the entire duration and Memory Usage which was around 60-70% for the entire duration. So there is no particular reason why it should be a bottleneck at such a small rate. Nevertheless, we scaled Sprout to see if there is any particular improvement in the results. But apart from minor increase in the successful requests, there is no particular improvement. The throughput graph is similar in trend and the bottleneck throughput is less than 20 req/s Here is the throughput graph which we obtained after benchmarking these requests: Throughput Graph <https://github.com/sushant-hiray/sip-dpi/blob/master/Version2/Graphs/Throughtput.png> We figured out through the logs of sprout that it was generating a 503 (Service Unavailable) Error Message. We went through the source code of sprout and found that perhaps the LoadMonitor::admit_request seems to be creating a virtual bottleneck by rejecting request as it exhausts its bucket size and so not taking any further tokens. I've some questions: Question1: Is there any way we can set the default bucket size other than 20 as I feel it is a problem here? Question2: Also has anyone tried benchmarking with such a bare minimum version. I've seen the benchmarking results at the clearwater website <http://www.projectclearwater.org/technical/clearwater-performance/> but they are specified for very large number of VM's ps: We ran similar benchmark tests on OpenIMSCore but we got much better results (for instance the bottleneck in the bare bones version was 30 req/s) and the mysql server was clearly the bottleneck there. We are still not able to figure out the bottleneck in the clearwater system such as memory or cpu usage. Which is why we feel that we are obtaining virtual bottleneck. Does this analogy seem correct? Looking forward to your response. Regards, Sushant Hiray, Senior Undergrad CSE, IIT Bombay _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater _______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/listinfo/clearwater
