New question #286655 on Graphite:
https://answers.launchpad.net/graphite/+question/286655

I've built a new Graphite/Grafana server for a project I'm working on at work.

It's a reasonably spec'd Virtual Machine with 4x vCPUs and is hooked up to a 
super-fast AllFlashArray via our Corporate vSphere.

However, I'm having a performance problem with my two carbon-cache daemons as 
per the following:

- I've got a number of high-performance production servers firing metrics 
directly into port 2004 on my Graphite server within the same VLAN.
- This is totaling around 285,000 metrics per minute

I was running 1 carbon-cache (and no relay), but my carbon dashboard on Grafana 
was indicating that I was hitting:
Cache.Size = 1 Million
Cache.Queue = 260,000

So, I've put a carbon-relay in front, and setup two carbon-cache daemons to 
help with the load, and now I'm seeing this:

http://picpaste.com/carbon_dashboard-BzvdqOJ8.PNG

As you can hopefully see from this picture:

- Carbon Relay (third row) is receiving the ~280K metrics and passing them to 
Cache A and Cache B at roughly 50/50
- Carbon Cache B (the new one) is receiving ~140K metrics, and committing ~140K 
metrics, and updating ~140K metrics every minute. It's also using around 45-50% 
of 1 CPU
- Carbon Cache A (the original one) is receiving ~135K metrics, committing 
~135K metrics, but only updating ~20-25K metrics every minute. It's also using 
more CPU then Cache B, at around 55-65% CPU, yet is processing less metrics and 
failing to update alot less metrics as quickly.

As a result, Cache A now has a cache.size of around 400K and a cache.queue of 
around 130K - approx half of what it was before.

What on earth is going on? How can Carbon Cache B be processing and 
storing/updating it's ~50% of the metrics instantly with no cache at all, yet 
Carbon Cache A is struggling? I'm seeing delays in metrics being rendered and I 
can only assume it's because they are stuck in the cache for Carbon Cache A.

I also don't understand how, if there is a deficit of ~110K for Carbon Cache 
A's metricsReceived vs updateOperations how the cache isn't growing by the same 
amount every minute, yet as you can see it's staying constant at around 130K


Here is my carbon.conf:

http://pastebin.com/5CrKNKzu

Would really appreciate anyone's time/advice on this so I can resolve the 
performance issues with Carbon Cache A.


-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : graphite-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to