Re: [ceph-users] effectively reducing scrub io impact

Frédéric Nass Thu, 20 Oct 2016 08:04:36 -0700

----- Le 20 Oct 16, à 15:03, Oliver Dzombic <[email protected]> a écrit :


> Hi Christian,

> thank you for your time.

> The problem is deep scrub only.

> Jewel 10.2.2 is used.

> Thank you for your hint with manual deep scrubs on specific OSD's. I
> didnt come up with that idea.

> -----

> Where do you know

> osd_scrub_sleep

> from ?

> I am saw here lately on the mailinglist multiple times many "hidden"
> config options. ( while hidden is everything which is not mentioned in
> the doku @ ceph.com ).

> ceph.com does not know about osd_scrub_sleep config option ( except
> mentioned in (past) release notes )

> The search engine finds it mainly in github or bugtracker.

> Is there any source of a (complete) list of available config options,
> useable by normal admin's ?

Hi Oliver, 

This is probably what you're looking for: 
https://github.com/ceph/ceph/blob/master/src/common/config_opts.h 

You can change the Branch on the left to match the version of your cluster. 

Regards, 

Frederic. 

> Or is it really neccessary to grab through source codes and release
> notes to collect that kind information on your own ?

> --
> Mit freundlichen Gruessen / Best regards

> Oliver Dzombic
> IP-Interactive

> mailto:[email protected]

> Anschrift:

> IP Interactive UG ( haftungsbeschraenkt )
> Zum Sonnenberg 1-3
> 63571 Gelnhausen

> HRB 93402 beim Amtsgericht Hanau
> Geschäftsführung: Oliver Dzombic

> Steuer Nr.: 35 236 3622 1
> UST ID: DE274086107

> Am 20.10.2016 um 14:39 schrieb Christian Balzer:

> > Hello,

> > On Thu, 20 Oct 2016 11:23:54 +0200 Oliver Dzombic wrote:

> >> Hi,

> >> we have here globally:

> >> osd_client_op_priority = 63
> >> osd_disk_thread_ioprio_class = idle
> >> osd_disk_thread_ioprio_priority = 7
> >> osd_max_scrubs = 1

> > If you google for osd_max_scrubs you will find plenty of threads, bug
> > reports, etc.

> > The most significant and benificial impact for client I/O can be achieved
> > by telling scrub to release its deadly grip on the OSDs with something like
> > osd_scrub_sleep = 0.1

> > Also which version, Hammer IIRC?
> > Jewel's unified queue should help as well, but no first hand experience
> > here.

> >> to influence the scrubbing performance and

> >> osd_scrub_begin_hour = 1
> >> osd_scrub_end_hour = 7

> >> to influence the scrubbing time frame


> >> Now, as it seems, this time frame is/was not enough, so ceph started
> >> scrubbing all the time, i assume because of the age of the objects.

> > You may want to line things up, so that OSDs/PGs are evenly spread out.
> > For example with 6 OSDs, manually initiate a deep scrub each day (at 01:00
> > in your case), so that only a specific subset is doing deep scrub conga.


> >> And it does it with:

> >> 4 active+clean+scrubbing+deep

> >> ( instead of the configured 1 )

> > That's per OSD, not global, see above, google.


> >> So now, we experience a situation, where the spinning drives are so
> >> busy, that the IO performance got too bad.

> >> The only reason that its not a catastrophy is, that we have a cache tier
> >> in front of it, which loweres the IO needs on the spnning drives.

> >> Unluckily we have also some pools going directly on the spinning drives.

> >> So these pools experience a very bad IO performance.

> >> So we had to disable scrubbing during business houres ( which is not
> >> really a solution ).

> > It is, unfortunately, for many people.
> > As mentioned many times, if your cluster is having issues with deep-scrubs
> > during peak hours, it will also be unhappy if you loose an OSD and
> > backfills happen.
> > If it is unhappy with normal scrubs, you need to upgrade/expand HW
> > immediately.

> >> So any idea why

> >> 1. 4-5 scrubs we can see, while osd_max_scrubs = 1 is set ?
> > See above.

> > With BlueStore in the wings and reduced (negated?) need for deep-scrubs, I
> > doubt this will see much coding effort.

> >> 2. Why the impact on the spinning drives is so hard, while we lowered
> >> the IO priority for it ?

> > That has only a small impact, deep-scrub by its very nature reads all
> > objects and thus kills I/Os by seeks and polluting caches.


> > Christian

> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] effectively reducing scrub io impact

Reply via email to