New question #187466 on Graphite:
https://answers.launchpad.net/graphite/+question/187466
I am having a number of problems trying to upgrade from 0.9.8 to 0.9.9.
I am using diamond to send my metrics in to graphite and that hasn't changed.
They were working prior to the upgrade.
I have it configured like so...
1. Each diamond server sends to a central relay in each datacenter.
2. The relay sends the metrics along to a cache server in the main datacenter.
When I start the relay I can see diamond connecting ( pickle receiver port 2014
).
Starting carbon-relay (instance a)
11/02/2012 00:39:50 :: [console] Log opened.
11/02/2012 00:39:50 :: [console] twistd 11.0.0 (/usr/local/rnt/bin/python
2.6.4) starting up.
11/02/2012 00:39:50 :: [console] reactor class:
twisted.internet.epollreactor.EPollReactor.
11/02/2012 00:39:50 :: [console] twisted.internet.protocol.ServerFactory
starting on 2013
11/02/2012 00:39:50 :: [console] Starting factory
<twisted.internet.protocol.ServerFactory instance at 0x906912c>
11/02/2012 00:39:50 :: [console] twisted.internet.protocol.ServerFactory
starting on 2014
11/02/2012 00:39:50 :: [console] Starting factory
<twisted.internet.protocol.ServerFactory instance at 0x9069a8c>
11/02/2012 00:39:50 :: [console] Starting factory
CarbonClientFactory(10.60.31.84:2004:None)
11/02/2012 00:39:50 :: [clients]
CarbonClientFactory(10.60.31.84:2004:None)::startedConnecting (10.60.31.84:2004)
11/02/2012 00:39:50 :: [clients]
CarbonClientProtocol(10.60.31.84:2004:None)::connectionMade
11/02/2012 00:39:53 :: [listener] MetricPickleReceiver connection with
10.60.31.4:58176 established
11/02/2012 00:40:06 :: [listener] MetricPickleReceiver connection with
10.60.35.10:33553 established
11/02/2012 00:40:09 :: [listener] MetricPickleReceiver connection with
10.60.36.11:50424 established
11/02/2012 00:40:10 :: [listener] MetricPickleReceiver connection with
10.60.35.33:48027 established
11/02/2012 00:40:35 :: [listener] MetricPickleReceiver connection with
10.60.31.84:39403 established
11/02/2012 00:40:36 :: [listener] MetricPickleReceiver connection with
10.60.35.59:41129 established
11/02/2012 00:40:50 :: [console] Unhandled error in Deferred:
11/02/2012 00:40:50 :: [console] Unhandled Error
Traceback (most recent call last):
File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py",
line 1162, in run
self.mainLoop()
File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py",
line 1171, in mainLoop
self.runUntilCurrent()
File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/base.py",
line 793, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/task.py",
line 194, in __call__
d = defer.maybeDeferred(self.f, *self.a, **self.kw)
--- <exception caught here> ---
File "/usr/local/rnt/lib/python2.6/site-packages/twisted/internet/defer.py",
line 133, in maybeDeferred
result = f(*args, **kw)
File "/usr/local/rnt/lib/python2.6/site-packages/carbon/instrumentation.py",
line 104, in recordMetrics
record('metricsReceived', myStats.get('metricsReceived', 0))
exceptions.UnboundLocalError: local variable 'record' referenced before
assignment
Also I keep getting the above error which looks like it is coming from the
section of code where it is reporting internal cache metrics, although it looks
like it only happens once.
So it appears that diamond and the relay are talking ok to each other and it
looks like the relay and the cache are talking to each other.
But I never see any updates on the cache server.
Incidentally the query below should be
"Infrastructur.servers.HC.aghc01.tcp.TCPPureAcks" etc... but it looks like the
first part is getting cut off.
==> /var/log/graphite/query.log <==
11/02/2012 00:41:50 :: [127.0.0.1:57739] cache query for
"astructure.servers.HC.aghc01.tcp.TCPPureAcks" returned 0 values
11/02/2012 00:41:50 :: [127.0.0.1:57739] cache query for
"astructure.servers.HC.aghc01.loadavg.1minute" returned 0 values
Also the console log on the cache server only ever updates the 13 internal
carbon metrics.
==> /var/log/graphite/console.log <==
11/02/2012 00:50:02 :: Sorted 13 cache queues in 0.000044 seconds
If I remove the carbon directory where the whisper files live it gets recreated
and I see the tree for all of my old metrics in the webui but the whisper files
don't ever get updated and consequently neither does the graph.
I'm not sure where it went wrong but it is definitely off the rails.
One other thing I've been rolling my own rpm's and completely removed all old
0.9.8 code from the source and only put the 0.9.9 source in the new rpm's. I
completely removed all rpms for graphite-web, carbon, and whisper and did
installs with the new rpms. Any idea where I should start looking?
Thanks!
Cody
--
You received this question notification because you are a member of
graphite-dev, which is an answer contact for Graphite.
_______________________________________________
Mailing list: https://launchpad.net/~graphite-dev
Post to : [email protected]
Unsubscribe : https://launchpad.net/~graphite-dev
More help : https://help.launchpad.net/ListHelp