[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

Boris Behrens Sun, 04 Dec 2022 11:59:57 -0800

@Alex:
the issue is done for now, but I fear it might come back sometime. The
cluster was running fine for months.
I check if we can restart the switches easily. Host reboots should also be
no problem.


There is no "implicated OSD" message in the logs.
All OSDs were recreated 3 months ago. (sync out, destroy, wipe, create,
sync in). Maybe I will reinstall with ubuntu 20.04 (currently centos7) for
newer kernel.

Am So., 4. Dez. 2022 um 19:58 Uhr schrieb Alex Gorbachev <
a...@iss-integration.com>:

> Hi Boris,
>
> These waits seem to be all over the place.  Usually, in the main ceph.log
> you see "implicated OSD" messages - I would try to find some commonality
> with either a host, switch, or something like that.  Can be bad ports/NICs,
> LACP problems, even bad cables sometimes.  I try to isolate an area that is
> problematic.  Sometimes rebooting OSD hosts one at a time.  Rebooting
> switches (if stacked/MLAG) one at a time.  Something has got to be there,
> which makes the problem go away.
> --
> Alex Gorbachev
> https://alextelescope.blogspot.com
>
-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: octopus rbd cluster just stopped out of nowhere (>20k slow ops)

Reply via email to