Re: [ceph-users] Debugging 'slow requests' ...

Brad Hubbard Fri, 08 Feb 2019 14:51:15 -0800

Try capturing another log with debug_ms turned up. 1 or 5 should be Ok
to start with.


On Fri, Feb 8, 2019 at 8:37 PM Massimo Sgaravatto
<[email protected]> wrote:
>
> Our Luminous ceph cluster have been worked without problems for a while, but 
> in the last days we have been suffering from continuous slow requests.
>
> We have indeed done some changes in the infrastructure recently:
>
> - Moved OSD nodes to a new switch
> - Increased pg nums for a pool, to have about ~ 100 PGs/OSD (also because  we 
> have to install new OSDs in the cluster). The output of 'ceph osd df' is 
> attached.
>
> The problem could also be due to some ''bad' client, but in the log I don't 
> see a clear "correlation" with specific clients or images for such blocked 
> requests.
>
> I also tried to update to latest luminous release and latest CentOS7, but 
> this didn't help.
>
>
>
> Attached you can find the detail of one of such slow operations which took 
> about 266 secs (output from 'ceph daemon osd.11 dump_historic_ops').
> So as far as I can understand from these events:
>                     {
>                         "time": "2019-02-08 10:26:25.651728",
>                         "event": "op_commit"
>                     },
>                     {
>                         "time": "2019-02-08 10:26:25.651965",
>                         "event": "op_applied"
>                     },
>
>                   {
>                         "time": "2019-02-08 10:26:25.653236",
>                         "event": "sub_op_commit_rec from 33"
>                     },
>                     {
>                         "time": "2019-02-08 10:30:51.890404",
>                         "event": "sub_op_commit_rec from 23"
>                     },
>
> the problem seems with the  "sub_op_commit_rec from 23" event which took too 
> much.
> So the problem is that the answer from OSD 23 took to much ?
>
>
> In the logs of the 2 OSD (11 and 23)in that time frame (attached) I can't 
> find anything useful.
> When the problem happened the load and the usage of memory was not high in 
> the relevant nodes.
>
>
> Any help to debug the issue is really appreciated ! :-)
>
> Thanks, Massimo
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Debugging 'slow requests' ...

Reply via email to