Re: [Clearwater] Subpar results while benchmarking Clearwater

Eleanor Merry Mon, 17 Nov 2014 11:53:14 -0800

Hi, 

Thanks for getting in touch with this!


When you ran the tests, what level of load did you start with? i.e. for 50 
req/s, did you start at 0 req/s and then jump to 50, or did you ramp up the 
load in steps? 

I ask because the load monitor calculations learn the appropriate token rate 
based on the latency of requests. 

The load monitor will admit a request if there are available tokens in its 
token bucket (at the cost of one token). It also replenishes the number of 
tokens based on what the token rate is, and how long it's been since the token 
bucket was last replenished. Every 20 requests, it gets the latency of the 
requests (as a smoothed mean), and compares this to the target latency (100ms). 
If it is less, then the token rate is increased. How much it's increased by 
depends on how far below the average latency is to the target latency. 
This means that if you're going from a cold start, then there is a slow start 
where the load monitor ramps up the token replacement rate (as there is a limit 
as to how fast the token rate can rise). 

If you are starting from a cold start, can you please try seeing what happens 
when you run at the higher load for longer than 20 secs? If you're not, or you 
don't see any improvement when running the test for a couple of minutes, then 
can you let me know what latencies you're seeing on the requests? 

We've not exposed the default bucket size as a configuration option. I've 
raised an issue in cpp-common to do this (see 
https://github.com/Metaswitch/cpp-common/issues/199). If you want to change it 
yourself, then it's set at 
https://github.com/Metaswitch/sprout/blob/dev/sprout/main.cpp#L170. 

Hope this helps,

Ellie


-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Sushant 
Hiray
Sent: 16 November 2014 11:45
To: [email protected]
Cc: Anmol Garg
Subject: [Clearwater] Subpar results while benchmarking Clearwater

Dear All,

As a part of our project we were trying to benchmark our Project Clearwater 
setup.

Our current setup is one node for each component (i.e no clustering
whatsoever) We installed this using the Manual Install Method from the docs.
Each VM has 2GB RAM and runs on Ubuntu 12.04 LTS

Experimental Setup:
We modified the UCT IMS client source code, to send requests according to the 
rate specified.
In the experiment, we are sending requests from the client to Bono at the rate 
of 10 req/s, 20 req/s, 30 req/s, 40 req/s, 50 req/s We ran each of our 
experiment for 20 seconds. So essentially sent req at 10 req/s for 20 seconds 
and so on and obtained the results.

Here are some results which we obtained after benchmarking these requests:
Throughput Graph
<https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Throughtput.png>
, Message Exchange: Client-Bono
<https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Client-Bono.png>,
Message exchange: Bono-Sprout
<https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Bono-Sprout.png>
, Message Exchange: Sprout-Homestead
<https://github.com/sushant-hiray/sip-dpi/blob/master/Version1/Graphs/Sprout-Hs.png>

As you can see from the Throughput graph, the bottleneck is somewhere between 
10-20 req/s which is pretty much small.
We as well tracked the CPU Usage which was around 4-5% for the entire duration 
and Memory Usage which was around 60-70% for the entire duration.

So there is no particular reason why it should be a bottleneck at such a small 
rate.

Nevertheless, we scaled Sprout to see if there is any particular improvement in 
the results.
But apart from minor increase in the successful requests, there is no 
particular improvement.
The throughput graph is similar in trend and the bottleneck throughput is less 
than 20 req/s Here is the throughput graph which we obtained after benchmarking 
these
requests:
Throughput Graph
<https://github.com/sushant-hiray/sip-dpi/blob/master/Version2/Graphs/Throughtput.png>

We figured out through the logs of sprout that it was generating a  503 
(Service Unavailable) Error Message.
We went through the source code of sprout and found that perhaps the 
LoadMonitor::admit_request seems to be creating a virtual bottleneck by 
rejecting request as it exhausts its bucket size and so not taking any further 
tokens.

I've some questions:
Question1: Is there any way we can set the default bucket size other than
20 as I feel it is a problem here?

Question2: Also has anyone tried benchmarking with such a bare minimum version. 
I've seen the benchmarking results at the clearwater website 
<http://www.projectclearwater.org/technical/clearwater-performance/> but they 
are specified for very large number of VM's

ps: We ran similar benchmark tests on OpenIMSCore but we got much better 
results (for instance the bottleneck in the bare bones version was 30
req/s) and the mysql server was clearly the bottleneck there. We are still not 
able to figure out the bottleneck in the clearwater system such as memory or 
cpu usage. Which is why we feel that we are obtaining virtual bottleneck. Does 
this analogy seem correct?

Looking forward to your response.

Regards,
Sushant Hiray,
Senior Undergrad CSE,
IIT Bombay
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater

Re: [Clearwater] Subpar results while benchmarking Clearwater

Reply via email to