New question #212209 on Graphite:
https://answers.launchpad.net/graphite/+question/212209

I have a rapidly-growing, evolving AWS deployment. My largest graphite cluster 
is currently one carbon-relay in front of six carbon-cache nodes using 
consistent hashing and memcached on each cache node. There are 450 EC2 
instances sending data to the carbon-relay via Joe Miller's collectd-graphite 
plugin. Each cache node shows between 35k-50k metricsReceived/minute (according 
to the carbon/agents graphite data).  The total metrics received per minute is 
around 240k. 

It's clear from that data that it's I/O bound, which is no surprise since I/O 
on AWS instances is notoriously quite poor (unless you go with the pricey SSD 
instance). The data volumes are RAID0 of the two ephemeral disks on an 
m1.large. It's becoming painful to rebalance the data files when adding new 
instances. Three more instances  will be the same price as an SSD. FWIW, each 
cache node is doing around 600-700 IOPS.

What is the best way to scale this cluster? Should I bite the bullet and fork 
out cash for an SSD, or is there something else I can do that I haven't thought 
of? 




-- 
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to