Question #285063 on Graphite changed: https://answers.launchpad.net/graphite/+question/285063
Will gave more information on the question: Ok, I've made the following adjustments: ===== /opt/graphite/bin/carbon.conf: MAX_CACHE_SIZE=10000000 MAX_UPDATES_PER_SECOND=50000 ===== 10MM metrics/minute divided by 60 seconds divided by 8 instances is about 21000 metrics per instance per second, so 50000 should be more than able. ===== /opt/graphite/bin/ccrelay.conf: cluster lga fnv1a_ch 0.0.0.0:2013=a 0.0.0.0:2113=b 0.0.0.0:2213=c 0.0.0.0:2313=d 0.0.0.0:2413=e 0.0.0.0:2513=f 0.0.0.0:2613=g 0.0.0.0:2713=h ; match * send to lga ; ===== ps out: root 4996 77.9 7.1 12455596 9446004 ? Ssl 20:58 26:42 /opt/graphite/bin/relay -f /opt/graphite/bin/ccrelay.conf -l /opt/graphite/storage/log/ccrelay/ccrelay.log -S 1 -D -P /var/run/ccrelay.pid -q 150000000 -b 200000 ===== Graphs: Graphite Stats: https://imgur.com/n28Q2Z5 Carbon-C-Relay Stats: https://imgur.com/ObHAum6 Looks like we could actually pare down the number of threads that carbon-c-relay runs but it otherwise seems to be handling the load quite well. However, I have some concerns at this point: 1) Committed points is always < Metrics received in Graphite stats. 2) The carbon-c-relay logfile occasionally shows this for a random instance: (ERR) failed to write() to 10.201.12.199:2013: uncomplete write The cache size on that instance is nearing MAX_CACHE_SIZE within 15 minutes, and the RAM usage on that instance is significantly higher than the others. This message goes away after I kill and restart the proc. Not sure what to do here but we've caused the cache sizes to tap out faster than usual. -- You received this question notification because your team graphite-dev is an answer contact for Graphite. _______________________________________________ Mailing list: https://launchpad.net/~graphite-dev Post to : graphite-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~graphite-dev More help : https://help.launchpad.net/ListHelp