New question #660458 on Graphite:
https://answers.launchpad.net/graphite/+question/660458

We are seeing some missing metrics (null)  from the cache when we query from 
graphite-web but some data is retrieved from the cache and displayed.  Once 
everything is flushed to disk all metrics appear.  Both members of the cluster 
exhibit the same behavior.

We have a new Graphite Cluster of two servers, each server is  running 
carbon-relay, one carbon cache instance per server and graphite-web sitting 
behind a load balancer.  We are using consistent-hashing with a 
replication-factor of 2 so that both servers have the same data.

Carbon-cache settles between 100k-300k metrics, the fewer metrics in cache 
results in fewer metrics missing.  Restarts of carbon-cache will mask the issue 
for a few hours as the cache grows.  We are writing ~275k metrics per minute.

The server referenced below uploads metrics every 5 seconds, matching what is 
set in the storage-scheme.conf.  We have other servers that upload metrics 
every 30 seconds and the behavior is similar.

Attached below are some of the configs and logs illustrating the issue.  If any 
other info is needed please let me know and any assistance is greatly 
appreciated.

tail -f query.log
07/11/2017 12:40:43 :: [127.0.0.1:36494] cache query for 
"web.servers.web.HOST1.perf.processor.pct_processor_time" returned 7 values

Below the oldest ~16 metrics have been persisted to disk.. all showing up 
correctly in graphite-web.  We then have ~21 null metrics that are not being 
pulled from the cache but I presume to be there since they get written to the 
disk a few moments later.  We then have the newest 7 metrics being pulled from 
the cache and shown in graphite-web correctly. 

curl 
"http://SERVER1/render/?target=web.servers.web.HOST1.perf.processor.pct_processor_time&format=json";
[7.0, 1510079840], [6.0, 1510079845], [4.0, 1510079850], [5.0, 1510079855], 
[5.0, 1510079860], [8.0, 1510079865], [5.0, 1510079870], [6.0, 1510079875], 
[9.0, 1510079880], [5.0, 1510079885], [4.0, 1510079890], [3.0, 1510079895], 
[4.0, 1510079900], [3.0, 1510079905], [null, 1510079910], [null, 1510079915], 
[null, 1510079920], [null, 1510079925], [null, 1510079930], [null, 1510079935], 
[null, 1510079940], [null, 1510079945], [null, 1510079950], [null, 1510079955], 
[null, 1510079960], [null, 1510079965], [null, 1510079970], [null, 1510079975], 
[null, 1510079980], [null, 1510079985], [null, 1510079990], [null, 1510079995], 
[null, 1510080000], [null, 1510080005], [4.0, 1510080010], [8.0, 1510080015], 
[6.0, 1510080020], [6.0, 1510080025], [4.0, 1510080030], [5.0, 1510080035], 
[4.0, 1510080040]],

python ~/whisper-info.py 
/opt/graphite/storage/whisper/web/servers/web/HOST1/perf/processor/pct_processor_time.wsp
maxRetention: 63072000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 10172212

Archive 0
retention: 2592000
secondsPerPoint: 5
points: 518400
size: 6220800
offset: 52

Archive 1
retention: 15552000
secondsPerPoint: 60
points: 259200
size: 3110400
offset: 6220852

Archive 2
retention: 63072000
secondsPerPoint: 900
points: 70080
size: 840960
offset: 9331252

carbon.conf
[cache]
DATABASE = whisper
ENABLE_LOGROTATION = True
USER =
MAX_CACHE_SIZE = 5000000
MAX_UPDATES_PER_SECOND = 750
MAX_CREATES_PER_MINUTE = 500
MIN_TIMESTAMP_RESOLUTION = 1
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2004
ENABLE_UDP_LISTENER = True
UDP_RECEIVER_INTERFACE = 0.0.0.0
UDP_RECEIVER_PORT = 2004
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2005
USE_INSECURE_UNPICKLER = False
CACHE_QUERY_INTERFACE = 0.0.0.0
CACHE_QUERY_PORT = 7002
USE_FLOW_CONTROL = True
LOG_UPDATES = False
LOG_CREATES = False
LOG_CACHE_HITS = True
LOG_CACHE_QUEUE_SORTS = True
CACHE_WRITE_STRATEGY = sorted
WHISPER_AUTOFLUSH = False
WHISPER_FALLOCATE_CREATE = True
CARBON_METRIC_PREFIX = carbon
CARBON_METRIC_INTERVAL = 60
GRAPHITE_URL = http://127.0.0.1:80
[relay]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2003
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2014
ENABLE_UDP_LISTENER = True
UDP_RECEIVER_INTERFACE = 0.0.0.0
UDP_RECEIVER_PORT = 2003
RELAY_METHOD = consistent-hashing
REPLICATION_FACTOR = 2
DESTINATIONS = 127.0.0.1:2005, 10.1.1.12:2005
MAX_QUEUE_SIZE = 100000
MAX_DATAPOINTS_PER_MESSAGE = 500
QUEUE_LOW_WATERMARK_PCT = 0.8
TIME_TO_DEFER_SENDING = 0.0001
USE_FLOW_CONTROL = True
USE_RATIO_RESET=False
MIN_RESET_STAT_FLOW=1000
MIN_RESET_RATIO=0.9
MIN_RESET_INTERVAL=121
[aggregator]
LINE_RECEIVER_INTERFACE = 0.0.0.0
LINE_RECEIVER_PORT = 2023
PICKLE_RECEIVER_INTERFACE = 0.0.0.0
PICKLE_RECEIVER_PORT = 2024
FORWARD_ALL = True
DESTINATIONS = 127.0.0.1:2004
REPLICATION_FACTOR = 1
MAX_QUEUE_SIZE = 10000
USE_FLOW_CONTROL = True
MAX_DATAPOINTS_PER_MESSAGE = 500
MAX_AGGREGATION_INTERVALS = 5

local_settings.py
SECRET_KEY = 'Edited'
CLUSTER_SERVERS = ["10.1.1.12:80"]
REMOTE_FIND_TIMEOUT = 3.0           # Timeout for metric find requests
REMOTE_FETCH_TIMEOUT = 3.0          # Timeout to fetch series data
REMOTE_RETRY_DELAY = 60.0           # Time before retrying a failed remote 
webapp

-- 
You received this question notification because your team graphite-dev
is an answer contact for Graphite.

_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to     : graphite-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~graphite-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to