On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer <[email protected]> wrote: > We have the same problems, we need to start the OSDs slowly. > The problem seems to be CPU congestion. A booting OSD will use all available > CPU power you give it, and if it doesn’t have enough nasty stuff happens > (this might actually be the manifestation of some kind of problem in our > setup as well). > It doesn’t do that always - I was restarting our hosts this weekend and most > of them came up fine with simple “service ceph start”, some just sat there > spinning the CPU and not doing any real world (and the cluster was not very > happy about that). > > Jan > > >> On 15 Jul 2015, at 10:53, Kostis Fardelas <[email protected]> wrote: >> >> Hello, >> after some trial and error we concluded that if we start the 6 stopped >> OSD daemons with a delay of 1 minute, we do not experience slow >> requests (threshold is set on 30 sec), althrough there are some ops >> that last up to 10s which is already high enough. I assume that if we >> spread the delay more, the slow requests will vanish. The possibility >> of not having tuned our setup to the most finest detail is not zeroed >> out but I wonder if at any way we miss some ceph tuning in terms of >> ceph configuration. >> >> We run firefly latest stable version. >> >> Regards, >> Kostis >> >> On 13 July 2015 at 13:28, Kostis Fardelas <[email protected]> wrote: >>> Hello, >>> after rebooting a ceph node and the OSDs starting booting and joining >>> the cluster, we experience slow requests that get resolved immediately >>> after cluster recovers. It is improtant to note that before the node >>> reboot, we set noout flag in order to prevent recovery - so there are >>> only degraded PGs when OSDs shut down- and let the cluster handle the >>> OSDs down/up in the lightest way. >>> >>> Is there any tunable we should consider in order to avoid service >>> degradation for our ceph clients? >>> >>> Regards, >>> Kostis >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
As far as I`ve seen this problem, the main issue for regular disk-backed OSDs is an IOPS starvation during some interval after reading maps from filestore and marking itself as 'in' - even if in-memory caches are still hot, I/O will significantly degrade for a short period. The possible workaround for an otherwise healthy cluster and node-wide restart is to set norecover flag, it would greatly reduce a chance of hitting slow operations. Of course it is applicable only to non-empty cluster with tens of percents of an average utilization for rotating media. I pointed this issue a couple of years ago first (it *does* break 30s I/O SLA for returning OSD, but refilling same OSDs from scratch would not violate the same SLA, giving out way bigger completion time for a refill). From UX side, it would be great to introduce some kind of recovery throttler for newly started OSDs, as recovery_ delay_start does not prevent immediate recovery procedures. _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
