On Wed, May 16, 2018 at 6:16 PM, Uwe Sauter <uwe.sauter...@gmail.com> wrote:
> Hi folks,
>
> I'm currently chewing on an issue regarding "slow requests are blocked". I'd 
> like to identify the OSD that is causing those events
> once the cluster is back to HEALTH_OK (as I have no monitoring yet that would 
> get this info in realtime).
>
> Collecting this information could help identify aging disks if you were able 
> to accumulate and analyze which OSD had blocking
> requests in the past and how often those events occur.
>
> My research so far let's me think that this information is only available as 
> long as the requests are actually blocked. Is this
> correct?

You don't give any indication what version you are running but see
https://tracker.ceph.com/issues/23205

>
> MON logs only show that those events occure and how many requests are in 
> blocking state but no indication of which OSD is
> affected. Is there a way to identify blocking requests from the OSD log files?
>
>
> On a side note: I was trying to write a small Python script that would 
> extract this kind of information in realtime but while I
> was able to register a MonitorLog callback that would receive the same 
> messages as you would get with "ceph -w" I haven's seen in
> the librados Python bindings documentation the possibility to do the 
> equivalent of "ceph health detail". Any suggestions on how to
> get the blocking OSDs via librados?
>
>
> Thanks,
>
>         Uwe
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to