Hi folks, I'm currently chewing on an issue regarding "slow requests are blocked". I'd like to identify the OSD that is causing those events once the cluster is back to HEALTH_OK (as I have no monitoring yet that would get this info in realtime).
Collecting this information could help identify aging disks if you were able to accumulate and analyze which OSD had blocking requests in the past and how often those events occur. My research so far let's me think that this information is only available as long as the requests are actually blocked. Is this correct? MON logs only show that those events occure and how many requests are in blocking state but no indication of which OSD is affected. Is there a way to identify blocking requests from the OSD log files? On a side note: I was trying to write a small Python script that would extract this kind of information in realtime but while I was able to register a MonitorLog callback that would receive the same messages as you would get with "ceph -w" I haven's seen in the librados Python bindings documentation the possibility to do the equivalent of "ceph health detail". Any suggestions on how to get the blocking OSDs via librados? Thanks, Uwe _______________________________________________ ceph-users mailing list firstname.lastname@example.org http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com