Hello,
since we upgraded to Luminous (12.2.2), we use the internal Ceph
exporter for getting the Ceph metrics to Prometheus. At random times we
get a Internal Server Error from the Ceph exporter, with python having a
key error with some random metric. Often it is "pg_*".
Here is an example of the python exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/cherrypy/_cprequest.py", line 670,
in respond
response.body = self.handler()
File "/usr/lib/python2.7/dist-packages/cherrypy/lib/encoding.py", line
217, in __call__
self.body = self.oldhandler(*args, **kwargs)
File "/usr/lib/python2.7/dist-packages/cherrypy/_cpdispatch.py", line 61,
in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/lib/ceph/mgr/prometheus/module.py", line 386, in metrics
metrics = global_instance().collect()
File "/usr/lib/ceph/mgr/prometheus/module.py", line 324, in collect
self.get_pg_status()
File "/usr/lib/ceph/mgr/prometheus/module.py", line 266, in get_pg_status
self.metrics[path].set(value)
KeyError: 'pg_deep'
After a certain time (could be 3-5 minutes oder sometimes even 40
minutes), the metric sending starts working again without any help.
Has anyone got an idea what could be done about that or does experience
similar problems?
Thanks,
Falk
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com