Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Samuel Just Fri, 03 Feb 2017 10:24:52 -0800

They do seem to exist in Jewel.
-Sam

On Fri, Feb 3, 2017 at 10:12 AM, David Turner <[email protected]
> wrote:


> After searching the code, osd_snap_trim_cost and osd_snap_trim_priority
> exist in Master but not in Jewel or Kraken.  If osd_snap_trim_sleep was
> made useless in Jewel by moving snap trimming to the main op thread and no
> new feature was added to Jewel to allow clusters to throttle snap
> trimming... What recourse do people that use a lot of snapshots to use
> Jewel?  Luckily this thread came around right before we were ready to push
> to production and we tested snap trimming heavily in QA and found that we
> can't even deal with half of our snap trimming on Jewel that we would need
> to.  All of these settings are also not injectable into the osd daemon so
> it would take a full restart of the all of the osds to change their
> settings...
>
> Does anyone have any success stories for snap trimming on Jewel?
>
> ------------------------------
>
> <https://storagecraft.com> David Turner | Cloud Operations Engineer | 
> StorageCraft
> Technology Corporation <https://storagecraft.com>
> 380 Data Drive Suite 300 | Draper | Utah | 84020
> Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943
> <(385)%20224-2943>
>
> ------------------------------
>
> If you are not the intended recipient of this message or received it
> erroneously, please notify the sender and delete it, together with any
> attachments, and be advised that any dissemination or copying of this
> message is prohibited.
>
> ------------------------------
>
> ------------------------------
> *From:* Samuel Just [[email protected]]
> *Sent:* Thursday, January 26, 2017 1:14 PM
> *To:* Nick Fisk
> *Cc:* David Turner; ceph-users
>
> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
> sleep?
>
> Just an update.  I think the real goal with the sleep configs in general
> was to reduce the number of concurrent snap trims happening.  To that end,
> I've put together a branch which adds an AsyncReserver (as with backfill)
> for snap trims to each OSD.  Before actually starting to do trim work, the
> primary will wait in line to get one of the slots and will hold that slot
> until the repops are complete.  https://github.com/athanatos/
> ceph/tree/wip-snap-trim-sleep is the branch (based on master), but I've
> got a bit more work to do (and testing to do) before it's ready to be
> tested.
> -Sam
>
> On Fri, Jan 20, 2017 at 2:05 PM, Nick Fisk <[email protected]> wrote:
>
>> Hi Sam,
>>
>>
>>
>> I have a test cluster, albeit small. I’m happy to run tests + graph
>> results with a wip branch and work out reasonable settings…etc
>>
>>
>>
>> *From:* Samuel Just [mailto:[email protected]]
>> *Sent:* 19 January 2017 23:23
>> *To:* David Turner <[email protected]>
>>
>> *Cc:* Nick Fisk <[email protected]>; ceph-users <[email protected]>
>> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
>> sleep?
>>
>>
>>
>> I could probably put together a wip branch if you have a test cluster you
>> could try it out on.
>>
>> -Sam
>>
>>
>>
>> On Thu, Jan 19, 2017 at 2:27 PM, David Turner <
>> [email protected]> wrote:
>>
>> To be clear, we are willing to change to a snap_trim_sleep of 0 and try
>> to manage it with the other available settings... but it is sounding like
>> that won't really work for us since our main op thread(s) will just be
>> saturated with snap trimming almost all day.  We currently only have ~6
>> hours/day where our snap trim q's are empty.
>> ------------------------------
>>
>>
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/1/HrVd_B9XmvZKT7jWcF0ftA/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
>>
>> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology
>> Corporation
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/2/oS1JCcJ92DHqKD4InYaflQ/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943
>> <(385)%20224-2943>
>> ------------------------------
>>
>> If you are not the intended recipient of this message or received it
>> erroneously, please notify the sender and delete it, together with any
>> attachments, and be advised that any dissemination or copying of this
>> message is prohibited.
>> ------------------------------
>> ------------------------------
>>
>> *From:* ceph-users [[email protected]] on behalf of
>> David Turner [[email protected]]
>> *Sent:* Thursday, January 19, 2017 3:25 PM
>> *To:* Samuel Just; Nick Fisk
>>
>>
>> *Cc:* ceph-users
>> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
>> sleep?
>>
>>
>>
>> We are a couple of weeks away from upgrading to Jewel in our production
>> clusters (after months of testing in our QA environments), but this might
>> prevent us from making the migration from Hammer.   We delete ~8,000
>> snapshots/day between 3 clusters and our snap_trim_q gets up to about 60
>> Million in each of those clusters.  We have to use an osd_snap_trim_sleep
>> of 0.25 to prevent our clusters from falling on their faces during our big
>> load and 0.1 the rest of the day to catch up on the snap trim q.
>>
>> Is our setup possible to use on Jewel?
>> ------------------------------
>>
>>
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/3/2T-RUO4LiBzpOfKC3TJuaw/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
>>
>> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology
>> Corporation
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/4/YROlWfddSDN41u1mZn3-jA/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t>
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943
>> <(385)%20224-2943>
>> ------------------------------
>>
>> If you are not the intended recipient of this message or received it
>> erroneously, please notify the sender and delete it, together with any
>> attachments, and be advised that any dissemination or copying of this
>> message is prohibited.
>> ------------------------------
>>
>> ________________________________________
>> From: ceph-users [[email protected]] on behalf of Samuel
>> Just [[email protected]]
>> Sent: Thursday, January 19, 2017 2:45 PM
>> To: Nick Fisk
>> Cc: ceph-users
>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?
>>
>> Yeah, I think you're probably right.  The answer is probably to add an
>> explicit rate-limiting element to the way the snaptrim events are
>> scheduled.
>> -Sam
>>
>> On Thu, Jan 19, 2017 at 1:34 PM, Nick Fisk <[email protected]> wrote:
>> > I will give those both a go and report back, but the more I thinking
>> about this the less I'm convinced that it's going to help.
>> >
>> > I think the problem is a general IO imbalance, there is probably
>> something like 100+ times more trimming IO than client IO and so even if
>> client IO gets promoted to the front of the queue by Ceph, once it hits the
>> Linux IO layer its fighting for itself. I guess this approach works with
>> scrubbing as each read IO has to wait to be read before the next one is
>> submitted, so the queue can be managed on the OSD. With trimming, writes
>> can buffer up below what the OSD controls.
>> >
>> > I don't know if the snap trimming goes nuts because the journals are
>> acking each request and the spinning disks can't keep up, or if it's
>> something else. Does WBThrottle get involved with snap trimming?
>> >
>> > But from an underlying disk perspective, there is definitely more than
>> 2 snaps per OSD at a time going on, even if the OSD itself is not
>> processing more than 2 at a time. I think there either needs to be another
>> knob so that Ceph can throttle back snaps, not just de-prioritise them. Or,
>> there needs a whole new kernel interface where an application can priority
>> tag individual IO's for CFQ to handle, instead of the current limitation of
>> priority per thread, I realise this is probably very very hard or
>> impossible. But it would allow Ceph to control IO queue's right down to the
>> disk.
>> >
>> >> -----Original Message-----
>> >> From: Samuel Just [mailto:[email protected]]
>> >> Sent: 19 January 2017 18:58
>> >> To: Nick Fisk <[email protected]>
>> >> Cc: Dan van der Ster <[email protected]>; ceph-users <
>> [email protected]>
>> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
>> sleep?
>> >>
>> >> Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the
>> default value, equal to a 16MB IO) and
>> >> osd_pg_max_concurrent_snap_trims to 1 (from 2)?
>> >> -Sam
>> >>
>> >> On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <[email protected]> wrote:
>> >> > Hi Sam,
>> >> >
>> >> > Thanks for the confirmation on both which thread the trimming
>> happens in and for confirming my suspicion that sleeping is now a
>> >> bad idea.
>> >> >
>> >> > The problem I see is that even with setting the priority for
>> trimming down low, it still seems to completely swamp the cluster. The
>> >> trims seem to get submitted in an async nature which seems to leave
>> all my disks sitting at queue depths of 50+ for several minutes
>> >> until the snapshot is removed, often also causing several OSD's to get
>> marked out and start flapping. I'm using WPQ but haven't
>> >> changed the cutoff variable yet as I know you are working on fixing a
>> bug with that.
>> >> >
>> >> > Nick
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Samuel Just [mailto:[email protected]]
>> >> >> Sent: 19 January 2017 15:47
>> >> >> To: Dan van der Ster <[email protected]>
>> >> >> Cc: Nick Fisk <[email protected]>; ceph-users
>> >> >> <[email protected]>
>> >> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during
>> sleep?
>> >> >>
>> >> >> Snaptrimming is now in the main op threadpool along with scrub,
>> >> >> recovery, and client IO.  I don't think it's a good idea to use any
>> of the _sleep configs anymore -- the intention is that by setting the
>> >> priority low, they won't actually be scheduled much.
>> >> >> -Sam
>> >> >>
>> >> >> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster <
>> [email protected]> wrote:
>> >> >> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <[email protected]>
>> wrote:
>> >> >> >> Hi Dan,
>> >> >> >>
>> >> >> >> I carried out some more testing after doubling the op threads, it
>> >> >> >> may have had a small benefit as potentially some threads are
>> >> >> >> available, but latency still sits more or less around the
>> >> >> >> configured snap sleep time. Even more threads might help, but I
>> >> >> >> suspect you are just
>> >> >> lowering the chance of IO's that are stuck behind the sleep, rather
>> than actually solving the problem.
>> >> >> >>
>> >> >> >> I'm guessing when the snap trimming was in disk thread, you
>> >> >> >> wouldn't have noticed these sleeps, but now it's in the op thread
>> >> >> >> it will just sit there holding up all IO and be a lot more
>> >> >> >> noticable. It might be
>> >> >> that this option shouldn't be used with Jewel+?
>> >> >> >
>> >> >> > That's a good thought -- so we need confirmation which thread is
>> >> >> > doing the snap trimming. I honestly can't figure it out from the
>> >> >> > code -- hopefully a dev could explain how it works.
>> >> >> >
>> >> >> > Otherwise, I don't have much practical experience with snap
>> >> >> > trimming in jewel yet -- our RBD cluster is still running 0.94.9.
>> >> >> >
>> >> >> > Cheers, Dan
>> >> >> >
>> >> >> >
>> >> >> >>
>> >> >> >>> -----Original Message-----
>> >> >> >>> From: ceph-users [mailto:[email protected]] On
>> >> >> >>> Behalf Of Nick Fisk
>> >> >> >>> Sent: 13 January 2017 20:38
>> >> >> >>> To: 'Dan van der Ster' <[email protected]>
>> >> >> >>> Cc: 'ceph-users' <[email protected]>
>> >> >> >>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG
>> during sleep?
>> >> >> >>>
>> >> >> >>> We're on Jewel and your right, I'm pretty sure the snap stuff
>> is also now handled in the op thread.
>> >> >> >>>
>> >> >> >>> The dump historic ops socket command showed a 10s delay at the
>> >> >> >>> "Reached PG" stage, from Greg's response [1], it would suggest
>> >> >> >>> that the OSD itself isn't blocking but the PG it's currently
>> >> >> >>> sleeping whilst trimming. I think in the former case, it would
>> >> >> >>> have a
>> >> >> >> high time
>> >> >> >>> on the "Started" part of the op? Anyway I will carry out some
>> >> >> >>> more testing with higher osd op threads and see if that makes
>> any difference. Thanks for the suggestion.
>> >> >> >>>
>> >> >> >>> Nick
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> [1]
>> >> >> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-
>> March/00
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/5/iDhtt_xNw-DgMZt82VicsA/aHR0cDovL2xpc3RzLmNlcGguY29tL3BpcGVybWFpbC9jZXBoLXVzZXJzLWNlcGguY29tLzIwMTYtTWFyY2gvMDA>
>> >> >> >>> 865
>> >> >> >>> 2.html
>> >> >> >>>
>> >> >> >>> > -----Original Message-----
>> >> >> >>> > From: Dan van der Ster [mailto:[email protected]]
>> >> >> >>> > Sent: 13 January 2017 10:28
>> >> >> >>> > To: Nick Fisk <[email protected]>
>> >> >> >>> > Cc: ceph-users <[email protected]>
>> >> >> >>> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG
>> during sleep?
>> >> >> >>> >
>> >> >> >>> > Hammer or jewel? I've forgotten which thread pool is handling
>> >> >> >>> > the snap trim nowadays -- is it the op thread yet? If so,
>> >> >> >>> > perhaps all the op threads are stuck sleeping? Just a wild
>> >> >> >>> > guess. (Maybe
>> >> >> >> increasing #
>> >> >> >>> op threads would help?).
>> >> >> >>> >
>> >> >> >>> > -- Dan
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]>
>> wrote:
>> >> >> >>> > > Hi,
>> >> >> >>> > >
>> >> >> >>> > > I had been testing some higher values with the
>> >> >> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of
>> >> >> >>> > > removing RBD snapshots on our cluster and I have come across
>> >> >> >>> > > what I believe to be a possible unintended consequence. The
>> >> >> >>> > > value of the sleep seems to keep the
>> >> >> >>> > lock on the PG open so that no other IO can use the PG whilst
>> the snap removal operation is sleeping.
>> >> >> >>> > >
>> >> >> >>> > > I had set the variable to 10s to completely minimise the
>> >> >> >>> > > impact as I had some multi TB snapshots to remove and
>> noticed
>> >> >> >>> > > that suddenly all IO to the cluster had a latency of roughly
>> >> >> >>> > > 10s as well, all the
>> >> >> >>> > dumped ops show waiting on PG for 10s as well.
>> >> >> >>> > >
>> >> >> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be
>> >> >> >>> > > used up to say a max of 0.1s and this is a known side
>> effect,
>> >> >> >>> > > or should the lock on the PG be removed so that normal IO
>> can
>> >> >> >>> > > continue during the
>> >> >> >>> > sleeps?
>> >> >> >>> > >
>> >> >> >>> > > Nick
>> >> >> >>> > >
>> >> >> >>> > > _______________________________________________
>> >> >> >>> > > ceph-users mailing list
>> >> >> >>> > > [email protected]
>> >> >> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/6/sbHghTVyVk5JbI0iP001ew/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>> >> >> >>>
>> >> >> >>> _______________________________________________
>> >> >> >>> ceph-users mailing list
>> >> >> >>> [email protected]
>> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/7/3YegxP6JOketJi-2NaAc_g/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>> >> >> >>
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > [email protected]
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/8/58_Mtx0TD6di3awWnSr7yw/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>> >> >
>> >
>> _______________________________________________
>> ceph-users mailing list
>> [email protected]
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/9/EwKRuOtVNaLW4rQpjxglbQ/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t>
>>
>>
>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?

Reply via email to