New question #276589 on Graphite:
https://answers.launchpad.net/graphite/+question/276589

I am running a config of 3 servers behind a single load balancer.  Server A, B, 
C all run a carbon-relay with 2 carbon-caches (1 for each cpu as I have read in 
other documentation).  I am seeing an issue where a consistent metric is 
missing periodically and then will be written later.

example:
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 guest.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:31 idle.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 iowait.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 irq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:31 nice.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 softirq.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 steal.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:31 system.wsp
-rw-r--r-- 1 graphite graphite 224680 Dec  3 03:34 user.wsp

You can see that idle, nice, and system cpu metrics are all behind by 3 
minutes.  these metrics are delivered every 60 seconds and my storage-schema 
matches that.

This is only on server A.  Server B and C both have the metrics.  I am running 
the same configs on all 3 boxes.  One really interesting thing I have seen is 
the cache-b logs have a lot of queries, and cache-a logs have none.  Also, 
cache-a never showed a queue increase where cache-b shows a queue increase to 
800.  I have been see fullqueuedrops but don't understand why.  

On the disk side I am running SSD and seeing the following from iostat.  I can 
provide more info if needed.

-sh-4.2$ iostat -d 1
Linux 3.10.0-229.14.1.el7.x86_64 (ip-10-110-1-18)       12/03/2015      
_x86_64_        (2 CPU)

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda            486.49         5.27      2171.49    4232842 1744289310

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              1.00         8.00         0.00          8          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda           2150.00         0.00      8600.00          0       8600

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda           3051.00         0.00     12232.00          0      12232

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda           2934.00         0.00     12984.00          0      12984

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda           1056.00         0.00      4228.00          0       4228

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00       

I am currently doing about 40k metrics / 60 seconds.  I'm really confused why 
I'm seeing a consistency in the missing metrics.  I thought if this was a queue 
or caching issue it could be random metrics.  Any help and direction would 
really be appreciated.
Thanks.

-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : graphite-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to