New question #172630 on Graphite:
https://answers.launchpad.net/graphite/+question/172630

Hey all,

We've been trying out different solutions for getting data from collectd to 
graphite and looks like we have hit a wall with hooking AMQP into carbon. No 
matter what I try I can't seem to get carbon to consume nearly as fast as we 
produce metrics. I've tried various combos of both small and large caches with 
both small and large writes_per_second and creates_per_minute but no matter 
what we do the message queue fills up a lot faster than carbon can consume.

Right now we're seeing about 34920 messages backlogged per minute in our 
message queue (582 per second).

The carbon has four 2 TB FC LUNs (it's 3PAR so it's a lot of 15k FC drives 
backing it, but the files are spread out over the entire drive cluster) striped 
on the host using LVM (for a total of ~8 TB) and regardless of what combo of 
cache settings we put in, we only see about 0.7% IO wait on the carbon host and 
LVM is set up to spread the load across all 4 LUNs, so I don't think that's the 
issue. From what I've read the sweet spot should be around ~50% IO wait for the 
disk.

I've also tried this going to local disks (15k SAS in RAID1) and we get the 
same results.

We're using ext4 with the deadline scheduler on RHEL6. The carbon host itself 
is a Dell M610, 8 core : Xeon(R) CPU X5570 @ 2.93GHz, and 48GB of RAM. While 
carbon is off the IO utilization is 0%

Our rabbitmq cluster is 2 hosts with the same specs sans the FC LUNs.

here is what we are using to get collectd data => message queue: 
https://github.com/poblahblahblah/collectd-http-carbon

here is our storage-schemas.conf file: 
https://gist.github.com/e88bc325926940d300d6

here is our carbon.conf file: https://gist.github.com/be63c1beae01b067600d

here is a bonnie++ run: https://gist.github.com/001eee920613aa30b42a

If we kill the unicorn process of sending the processed metrics to rabbitmq 
carbon eventually catches up and the graphs are updated as expected. Are we 
doing too many updates or do we just need to look into some kind of intelligent 
way to split the data up amongst multiple servers with different patterns per 
carbon-cache process?

Let me know if there is any other data which would help out. I am sure I am 
just doing something dumb with carbon.


-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to