Hi folks,
I'm currently chewing on an issue regarding "slow requests are blocked". I'd
like to identify the OSD that is causing those events
once the cluster is back to HEALTH_OK (as I have no monitoring yet that would
get this info in realtime).
Collecting this information could help identify aging disks if you were able to
accumulate and analyze which OSD had blocking
requests in the past and how often those events occur.
My research so far let's me think that this information is only available as
long as the requests are actually blocked. Is this
correct?
MON logs only show that those events occure and how many requests are in
blocking state but no indication of which OSD is
affected. Is there a way to identify blocking requests from the OSD log files?
On a side note: I was trying to write a small Python script that would extract
this kind of information in realtime but while I
was able to register a MonitorLog callback that would receive the same messages
as you would get with "ceph -w" I haven's seen in
the librados Python bindings documentation the possibility to do the equivalent
of "ceph health detail". Any suggestions on how to
get the blocking OSDs via librados?
Thanks,
Uwe
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com