On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer <[email protected]> wrote:
> We have the same problems, we need to start the OSDs slowly.
> The problem seems to be CPU congestion. A booting OSD will use all available 
> CPU power you give it, and if it doesn’t have enough nasty stuff happens 
> (this might actually be the manifestation of some kind of problem in our 
> setup as well).
> It doesn’t do that always - I was restarting our hosts this weekend and most 
> of them came up fine with simple “service ceph start”, some just sat there 
> spinning the CPU and not doing any real world (and the cluster was not very 
> happy about that).
>
> Jan
>
>
>> On 15 Jul 2015, at 10:53, Kostis Fardelas <[email protected]> wrote:
>>
>> Hello,
>> after some trial and error we concluded that if we start the 6 stopped
>> OSD daemons with a delay of 1 minute, we do not experience slow
>> requests (threshold is set on 30 sec), althrough there are some ops
>> that last up to 10s which is already high enough. I assume that if we
>> spread the delay more, the slow requests will vanish. The possibility
>> of not having tuned our setup to the most finest detail is not zeroed
>> out but I wonder if at any way we miss some ceph tuning in terms of
>> ceph configuration.
>>
>> We run firefly latest stable version.
>>
>> Regards,
>> Kostis
>>
>> On 13 July 2015 at 13:28, Kostis Fardelas <[email protected]> wrote:
>>> Hello,
>>> after rebooting a ceph node and the OSDs starting booting and joining
>>> the cluster, we experience slow requests that get resolved immediately
>>> after cluster recovers. It is improtant to note that before the node
>>> reboot, we set noout flag in order to prevent recovery - so there are
>>> only degraded PGs when OSDs shut down- and let the cluster handle the
>>> OSDs down/up in the lightest way.
>>>
>>> Is there any tunable we should consider in order to avoid service
>>> degradation for our ceph clients?
>>>
>>> Regards,
>>> Kostis
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


As far as I`ve seen this problem, the main issue for regular
disk-backed OSDs is an IOPS starvation during some interval after
reading maps from filestore and marking itself as 'in' - even if
in-memory caches are still hot, I/O will significantly degrade for a
short period. The possible workaround for an otherwise healthy cluster
and node-wide restart is to set norecover flag, it would greatly
reduce a chance of hitting slow operations. Of course it is applicable
only to non-empty cluster with tens of percents of an average
utilization for rotating media. I pointed this issue a couple of years
ago first (it *does* break 30s I/O SLA for returning OSD, but
refilling same OSDs from scratch would not violate the same SLA,
giving out way bigger completion time for a refill). From UX side, it
would be great to introduce some kind of recovery throttler for newly
started OSDs, as recovery_ delay_start does not prevent immediate
recovery procedures.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to