They do seem to exist in Jewel. -Sam On Fri, Feb 3, 2017 at 10:12 AM, David Turner <[email protected] > wrote:
> After searching the code, osd_snap_trim_cost and osd_snap_trim_priority > exist in Master but not in Jewel or Kraken. If osd_snap_trim_sleep was > made useless in Jewel by moving snap trimming to the main op thread and no > new feature was added to Jewel to allow clusters to throttle snap > trimming... What recourse do people that use a lot of snapshots to use > Jewel? Luckily this thread came around right before we were ready to push > to production and we tested snap trimming heavily in QA and found that we > can't even deal with half of our snap trimming on Jewel that we would need > to. All of these settings are also not injectable into the osd daemon so > it would take a full restart of the all of the osds to change their > settings... > > Does anyone have any success stories for snap trimming on Jewel? > > ------------------------------ > > <https://storagecraft.com> David Turner | Cloud Operations Engineer | > StorageCraft > Technology Corporation <https://storagecraft.com> > 380 Data Drive Suite 300 | Draper | Utah | 84020 > Office: 801.871.2760 <(801)%20871-2760> | Mobile: 385.224.2943 > <(385)%20224-2943> > > ------------------------------ > > If you are not the intended recipient of this message or received it > erroneously, please notify the sender and delete it, together with any > attachments, and be advised that any dissemination or copying of this > message is prohibited. > > ------------------------------ > > ------------------------------ > *From:* Samuel Just [[email protected]] > *Sent:* Thursday, January 26, 2017 1:14 PM > *To:* Nick Fisk > *Cc:* David Turner; ceph-users > > *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during > sleep? > > Just an update. I think the real goal with the sleep configs in general > was to reduce the number of concurrent snap trims happening. To that end, > I've put together a branch which adds an AsyncReserver (as with backfill) > for snap trims to each OSD. Before actually starting to do trim work, the > primary will wait in line to get one of the slots and will hold that slot > until the repops are complete. https://github.com/athanatos/ > ceph/tree/wip-snap-trim-sleep is the branch (based on master), but I've > got a bit more work to do (and testing to do) before it's ready to be > tested. > -Sam > > On Fri, Jan 20, 2017 at 2:05 PM, Nick Fisk <[email protected]> wrote: > >> Hi Sam, >> >> >> >> I have a test cluster, albeit small. I’m happy to run tests + graph >> results with a wip branch and work out reasonable settings…etc >> >> >> >> *From:* Samuel Just [mailto:[email protected]] >> *Sent:* 19 January 2017 23:23 >> *To:* David Turner <[email protected]> >> >> *Cc:* Nick Fisk <[email protected]>; ceph-users <[email protected]> >> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> sleep? >> >> >> >> I could probably put together a wip branch if you have a test cluster you >> could try it out on. >> >> -Sam >> >> >> >> On Thu, Jan 19, 2017 at 2:27 PM, David Turner < >> [email protected]> wrote: >> >> To be clear, we are willing to change to a snap_trim_sleep of 0 and try >> to manage it with the other available settings... but it is sounding like >> that won't really work for us since our main op thread(s) will just be >> saturated with snap trimming almost all day. We currently only have ~6 >> hours/day where our snap trim q's are empty. >> ------------------------------ >> >> >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/1/HrVd_B9XmvZKT7jWcF0ftA/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t> >> >> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology >> Corporation >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/2/oS1JCcJ92DHqKD4InYaflQ/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t> >> 380 Data Drive Suite 300 | Draper | Utah | 84020 >> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943 >> <(385)%20224-2943> >> ------------------------------ >> >> If you are not the intended recipient of this message or received it >> erroneously, please notify the sender and delete it, together with any >> attachments, and be advised that any dissemination or copying of this >> message is prohibited. >> ------------------------------ >> ------------------------------ >> >> *From:* ceph-users [[email protected]] on behalf of >> David Turner [[email protected]] >> *Sent:* Thursday, January 19, 2017 3:25 PM >> *To:* Samuel Just; Nick Fisk >> >> >> *Cc:* ceph-users >> *Subject:* Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> sleep? >> >> >> >> We are a couple of weeks away from upgrading to Jewel in our production >> clusters (after months of testing in our QA environments), but this might >> prevent us from making the migration from Hammer. We delete ~8,000 >> snapshots/day between 3 clusters and our snap_trim_q gets up to about 60 >> Million in each of those clusters. We have to use an osd_snap_trim_sleep >> of 0.25 to prevent our clusters from falling on their faces during our big >> load and 0.1 the rest of the day to catch up on the snap trim q. >> >> Is our setup possible to use on Jewel? >> ------------------------------ >> >> >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/3/2T-RUO4LiBzpOfKC3TJuaw/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t> >> >> *David* *Turner* | Cloud Operations Engineer | StorageCraft Technology >> Corporation >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/4/YROlWfddSDN41u1mZn3-jA/aHR0cHM6Ly9zdG9yYWdlY3JhZnQuY29t> >> 380 Data Drive Suite 300 | Draper | Utah | 84020 >> *Office: *801.871.2760 <(801)%20871-2760> | *Mobile: *385.224.2943 >> <(385)%20224-2943> >> ------------------------------ >> >> If you are not the intended recipient of this message or received it >> erroneously, please notify the sender and delete it, together with any >> attachments, and be advised that any dissemination or copying of this >> message is prohibited. >> ------------------------------ >> >> ________________________________________ >> From: ceph-users [[email protected]] on behalf of Samuel >> Just [[email protected]] >> Sent: Thursday, January 19, 2017 2:45 PM >> To: Nick Fisk >> Cc: ceph-users >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? >> >> Yeah, I think you're probably right. The answer is probably to add an >> explicit rate-limiting element to the way the snaptrim events are >> scheduled. >> -Sam >> >> On Thu, Jan 19, 2017 at 1:34 PM, Nick Fisk <[email protected]> wrote: >> > I will give those both a go and report back, but the more I thinking >> about this the less I'm convinced that it's going to help. >> > >> > I think the problem is a general IO imbalance, there is probably >> something like 100+ times more trimming IO than client IO and so even if >> client IO gets promoted to the front of the queue by Ceph, once it hits the >> Linux IO layer its fighting for itself. I guess this approach works with >> scrubbing as each read IO has to wait to be read before the next one is >> submitted, so the queue can be managed on the OSD. With trimming, writes >> can buffer up below what the OSD controls. >> > >> > I don't know if the snap trimming goes nuts because the journals are >> acking each request and the spinning disks can't keep up, or if it's >> something else. Does WBThrottle get involved with snap trimming? >> > >> > But from an underlying disk perspective, there is definitely more than >> 2 snaps per OSD at a time going on, even if the OSD itself is not >> processing more than 2 at a time. I think there either needs to be another >> knob so that Ceph can throttle back snaps, not just de-prioritise them. Or, >> there needs a whole new kernel interface where an application can priority >> tag individual IO's for CFQ to handle, instead of the current limitation of >> priority per thread, I realise this is probably very very hard or >> impossible. But it would allow Ceph to control IO queue's right down to the >> disk. >> > >> >> -----Original Message----- >> >> From: Samuel Just [mailto:[email protected]] >> >> Sent: 19 January 2017 18:58 >> >> To: Nick Fisk <[email protected]> >> >> Cc: Dan van der Ster <[email protected]>; ceph-users < >> [email protected]> >> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> sleep? >> >> >> >> Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the >> default value, equal to a 16MB IO) and >> >> osd_pg_max_concurrent_snap_trims to 1 (from 2)? >> >> -Sam >> >> >> >> On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <[email protected]> wrote: >> >> > Hi Sam, >> >> > >> >> > Thanks for the confirmation on both which thread the trimming >> happens in and for confirming my suspicion that sleeping is now a >> >> bad idea. >> >> > >> >> > The problem I see is that even with setting the priority for >> trimming down low, it still seems to completely swamp the cluster. The >> >> trims seem to get submitted in an async nature which seems to leave >> all my disks sitting at queue depths of 50+ for several minutes >> >> until the snapshot is removed, often also causing several OSD's to get >> marked out and start flapping. I'm using WPQ but haven't >> >> changed the cutoff variable yet as I know you are working on fixing a >> bug with that. >> >> > >> >> > Nick >> >> > >> >> >> -----Original Message----- >> >> >> From: Samuel Just [mailto:[email protected]] >> >> >> Sent: 19 January 2017 15:47 >> >> >> To: Dan van der Ster <[email protected]> >> >> >> Cc: Nick Fisk <[email protected]>; ceph-users >> >> >> <[email protected]> >> >> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> sleep? >> >> >> >> >> >> Snaptrimming is now in the main op threadpool along with scrub, >> >> >> recovery, and client IO. I don't think it's a good idea to use any >> of the _sleep configs anymore -- the intention is that by setting the >> >> priority low, they won't actually be scheduled much. >> >> >> -Sam >> >> >> >> >> >> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster < >> [email protected]> wrote: >> >> >> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <[email protected]> >> wrote: >> >> >> >> Hi Dan, >> >> >> >> >> >> >> >> I carried out some more testing after doubling the op threads, it >> >> >> >> may have had a small benefit as potentially some threads are >> >> >> >> available, but latency still sits more or less around the >> >> >> >> configured snap sleep time. Even more threads might help, but I >> >> >> >> suspect you are just >> >> >> lowering the chance of IO's that are stuck behind the sleep, rather >> than actually solving the problem. >> >> >> >> >> >> >> >> I'm guessing when the snap trimming was in disk thread, you >> >> >> >> wouldn't have noticed these sleeps, but now it's in the op thread >> >> >> >> it will just sit there holding up all IO and be a lot more >> >> >> >> noticable. It might be >> >> >> that this option shouldn't be used with Jewel+? >> >> >> > >> >> >> > That's a good thought -- so we need confirmation which thread is >> >> >> > doing the snap trimming. I honestly can't figure it out from the >> >> >> > code -- hopefully a dev could explain how it works. >> >> >> > >> >> >> > Otherwise, I don't have much practical experience with snap >> >> >> > trimming in jewel yet -- our RBD cluster is still running 0.94.9. >> >> >> > >> >> >> > Cheers, Dan >> >> >> > >> >> >> > >> >> >> >> >> >> >> >>> -----Original Message----- >> >> >> >>> From: ceph-users [mailto:[email protected]] On >> >> >> >>> Behalf Of Nick Fisk >> >> >> >>> Sent: 13 January 2017 20:38 >> >> >> >>> To: 'Dan van der Ster' <[email protected]> >> >> >> >>> Cc: 'ceph-users' <[email protected]> >> >> >> >>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG >> during sleep? >> >> >> >>> >> >> >> >>> We're on Jewel and your right, I'm pretty sure the snap stuff >> is also now handled in the op thread. >> >> >> >>> >> >> >> >>> The dump historic ops socket command showed a 10s delay at the >> >> >> >>> "Reached PG" stage, from Greg's response [1], it would suggest >> >> >> >>> that the OSD itself isn't blocking but the PG it's currently >> >> >> >>> sleeping whilst trimming. I think in the former case, it would >> >> >> >>> have a >> >> >> >> high time >> >> >> >>> on the "Started" part of the op? Anyway I will carry out some >> >> >> >>> more testing with higher osd op threads and see if that makes >> any difference. Thanks for the suggestion. >> >> >> >>> >> >> >> >>> Nick >> >> >> >>> >> >> >> >>> >> >> >> >>> [1] >> >> >> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016- >> March/00 >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/5/iDhtt_xNw-DgMZt82VicsA/aHR0cDovL2xpc3RzLmNlcGguY29tL3BpcGVybWFpbC9jZXBoLXVzZXJzLWNlcGguY29tLzIwMTYtTWFyY2gvMDA> >> >> >> >>> 865 >> >> >> >>> 2.html >> >> >> >>> >> >> >> >>> > -----Original Message----- >> >> >> >>> > From: Dan van der Ster [mailto:[email protected]] >> >> >> >>> > Sent: 13 January 2017 10:28 >> >> >> >>> > To: Nick Fisk <[email protected]> >> >> >> >>> > Cc: ceph-users <[email protected]> >> >> >> >>> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG >> during sleep? >> >> >> >>> > >> >> >> >>> > Hammer or jewel? I've forgotten which thread pool is handling >> >> >> >>> > the snap trim nowadays -- is it the op thread yet? If so, >> >> >> >>> > perhaps all the op threads are stuck sleeping? Just a wild >> >> >> >>> > guess. (Maybe >> >> >> >> increasing # >> >> >> >>> op threads would help?). >> >> >> >>> > >> >> >> >>> > -- Dan >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]> >> wrote: >> >> >> >>> > > Hi, >> >> >> >>> > > >> >> >> >>> > > I had been testing some higher values with the >> >> >> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of >> >> >> >>> > > removing RBD snapshots on our cluster and I have come across >> >> >> >>> > > what I believe to be a possible unintended consequence. The >> >> >> >>> > > value of the sleep seems to keep the >> >> >> >>> > lock on the PG open so that no other IO can use the PG whilst >> the snap removal operation is sleeping. >> >> >> >>> > > >> >> >> >>> > > I had set the variable to 10s to completely minimise the >> >> >> >>> > > impact as I had some multi TB snapshots to remove and >> noticed >> >> >> >>> > > that suddenly all IO to the cluster had a latency of roughly >> >> >> >>> > > 10s as well, all the >> >> >> >>> > dumped ops show waiting on PG for 10s as well. >> >> >> >>> > > >> >> >> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be >> >> >> >>> > > used up to say a max of 0.1s and this is a known side >> effect, >> >> >> >>> > > or should the lock on the PG be removed so that normal IO >> can >> >> >> >>> > > continue during the >> >> >> >>> > sleeps? >> >> >> >>> > > >> >> >> >>> > > Nick >> >> >> >>> > > >> >> >> >>> > > _______________________________________________ >> >> >> >>> > > ceph-users mailing list >> >> >> >>> > > [email protected] >> >> >> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/6/sbHghTVyVk5JbI0iP001ew/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t> >> >> >> >>> >> >> >> >>> _______________________________________________ >> >> >> >>> ceph-users mailing list >> >> >> >>> [email protected] >> >> >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/7/3YegxP6JOketJi-2NaAc_g/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t> >> >> >> >> >> >> >> > _______________________________________________ >> >> >> > ceph-users mailing list >> >> >> > [email protected] >> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/8/58_Mtx0TD6di3awWnSr7yw/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t> >> >> > >> > >> _______________________________________________ >> ceph-users mailing list >> [email protected] >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> <http://xo4t.mj.am/lnk/AEQAHbHmxC0AAAAAAAAAAFklQUAAADNJBWwAAAAAAACRXwBYgomZ7d_x14_XQr65vOmTmCx8lwAAlBI/9/EwKRuOtVNaLW4rQpjxglbQ/aHR0cDovL2xpc3RzLmNlcGguY29tL2xpc3RpbmZvLmNnaS9jZXBoLXVzZXJzLWNlcGguY29t> >> >> >> >> >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
