New question #276589 on Graphite: https://answers.launchpad.net/graphite/+question/276589
I am running a config of 3 servers behind a single load balancer. Server A, B, C all run a carbon-relay with 2 carbon-caches (1 for each cpu as I have read in other documentation). I am seeing an issue where a consistent metric is missing periodically and then will be written later. example: -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 guest.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 idle.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 iowait.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 irq.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 nice.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 softirq.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 steal.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:31 system.wsp -rw-r--r-- 1 graphite graphite 224680 Dec 3 03:34 user.wsp You can see that idle, nice, and system cpu metrics are all behind by 3 minutes. these metrics are delivered every 60 seconds and my storage-schema matches that. This is only on server A. Server B and C both have the metrics. I am running the same configs on all 3 boxes. One really interesting thing I have seen is the cache-b logs have a lot of queries, and cache-a logs have none. Also, cache-a never showed a queue increase where cache-b shows a queue increase to 800. I have been see fullqueuedrops but don't understand why. On the disk side I am running SSD and seeing the following from iostat. I can provide more info if needed. -sh-4.2$ iostat -d 1 Linux 3.10.0-229.14.1.el7.x86_64 (ip-10-110-1-18) 12/03/2015 _x86_64_ (2 CPU) Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 486.49 5.27 2171.49 4232842 1744289310 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 1.00 8.00 0.00 8 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 0.00 0.00 0.00 0 0 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 2150.00 0.00 8600.00 0 8600 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 3051.00 0.00 12232.00 0 12232 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 2934.00 0.00 12984.00 0 12984 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 1056.00 0.00 4228.00 0 4228 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvda 0.00 0.00 0.00 I am currently doing about 40k metrics / 60 seconds. I'm really confused why I'm seeing a consistency in the missing metrics. I thought if this was a queue or caching issue it could be random metrics. Any help and direction would really be appreciated. Thanks. -- You received this question notification because your team graphite-dev is an answer contact for Graphite. _______________________________________________ Mailing list: https://launchpad.net/~graphite-dev Post to : graphite-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~graphite-dev More help : https://help.launchpad.net/ListHelp