Question #214631 on Graphite changed: https://answers.launchpad.net/graphite/+question/214631
Yee-Ting Li posted a new comment: Hi, just to provide some comparisons: i currently use a dell r610 (similar spec to your machines), but i use 4x1tb local disks behind a raid 5. the performance is rather pitiful (mainly due to the sub-par raid card - a MegaRAID SAS 1078). looking at atop, i can get around 200 writes/sec with about 50 reads/sec (at the same time). far lower than the 500. currently i push around half a million datapoints into carbon per minute. each datapoint is actually unique (ie half a million different metrics are sent per minute). as you can imagine this is rather aggressive - but it is stable. i achieve this by having 16 carbon instances, and i do client side consistent hashing to the 16 separate instances - hence each instance handles around 32,000 metrics per minute on average. (using the relay peg's cpu too much and hence limits the number of metrics it can receive) i have set each instances MAX_CACHE_SIZE to around 10,000,000; however, some instances on average only hover around 1,000,000 (you can spot under carbon.agents.<agent>.cache.size). in theory, i would preferably have this much lower (ie less than 24MB memory divided by number of instances) if it weren't for my observations below. i have also set the number of MAX_UPDATES_PER_SECOND to ~5 for each instance (ie 5*16*60 =~ 5,000) which is lower than the max writes/sec from atop, this is mainly so i do not starve the flush process writing the (kernel) stuff from cache to disk. the real issue i have is that the consistent hashing isn't very balanced, so i end up with a couple of instances taking on ~50,000 (unique) metrics per minute. for some reason, this causes these specific instances do far less updates/minute than the other instances; which causes their caches to grow and eventually fill up to the 10,000,000. i've noticed that instances appear to have issues when they reach over ~35,000 queues/minute. i deal with this problem by buffering at the client side. i do use FLOW_CONTROL as a means to notify the specific client instance that it should start buffering. but i'm not sure this is completely wise. my advice is to run as many instances of carbon-cache as possible to keep your queue sizes low. also, if you configure the consistent hashing right, the webapp should hit the instance cache so you /should/ get the data immediately after it's been feed into the carbon-cache. -- You received this question notification because you are a member of graphite-dev, which is an answer contact for Graphite. _______________________________________________ Mailing list: https://launchpad.net/~graphite-dev Post to : [email protected] Unsubscribe : https://launchpad.net/~graphite-dev More help : https://help.launchpad.net/ListHelp

