Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the default value, equal to a 16MB IO) and osd_pg_max_concurrent_snap_trims to 1 (from 2)? -Sam
On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <[email protected]> wrote: > Hi Sam, > > Thanks for the confirmation on both which thread the trimming happens in and > for confirming my suspicion that sleeping is now a bad idea. > > The problem I see is that even with setting the priority for trimming down > low, it still seems to completely swamp the cluster. The trims seem to get > submitted in an async nature which seems to leave all my disks sitting at > queue depths of 50+ for several minutes until the snapshot is removed, often > also causing several OSD's to get marked out and start flapping. I'm using > WPQ but haven't changed the cutoff variable yet as I know you are working on > fixing a bug with that. > > Nick > >> -----Original Message----- >> From: Samuel Just [mailto:[email protected]] >> Sent: 19 January 2017 15:47 >> To: Dan van der Ster <[email protected]> >> Cc: Nick Fisk <[email protected]>; ceph-users <[email protected]> >> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? >> >> Snaptrimming is now in the main op threadpool along with scrub, recovery, >> and client IO. I don't think it's a good idea to use any of the >> _sleep configs anymore -- the intention is that by setting the priority low, >> they won't actually be scheduled much. >> -Sam >> >> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster <[email protected]> >> wrote: >> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <[email protected]> wrote: >> >> Hi Dan, >> >> >> >> I carried out some more testing after doubling the op threads, it may >> >> have had a small benefit as potentially some threads are available, >> >> but latency still sits more or less around the configured snap sleep >> >> time. Even more threads might help, but I suspect you are just >> lowering the chance of IO's that are stuck behind the sleep, rather than >> actually solving the problem. >> >> >> >> I'm guessing when the snap trimming was in disk thread, you wouldn't >> >> have noticed these sleeps, but now it's in the op thread it will just sit >> >> there holding up all IO and be a lot more noticable. It might be >> that this option shouldn't be used with Jewel+? >> > >> > That's a good thought -- so we need confirmation which thread is doing >> > the snap trimming. I honestly can't figure it out from the code -- >> > hopefully a dev could explain how it works. >> > >> > Otherwise, I don't have much practical experience with snap trimming >> > in jewel yet -- our RBD cluster is still running 0.94.9. >> > >> > Cheers, Dan >> > >> > >> >> >> >>> -----Original Message----- >> >>> From: ceph-users [mailto:[email protected]] On >> >>> Behalf Of Nick Fisk >> >>> Sent: 13 January 2017 20:38 >> >>> To: 'Dan van der Ster' <[email protected]> >> >>> Cc: 'ceph-users' <[email protected]> >> >>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> >>> sleep? >> >>> >> >>> We're on Jewel and your right, I'm pretty sure the snap stuff is also >> >>> now handled in the op thread. >> >>> >> >>> The dump historic ops socket command showed a 10s delay at the >> >>> "Reached PG" stage, from Greg's response [1], it would suggest that >> >>> the OSD itself isn't blocking but the PG it's currently sleeping >> >>> whilst trimming. I think in the former case, it would have a >> >> high time >> >>> on the "Started" part of the op? Anyway I will carry out some more >> >>> testing with higher osd op threads and see if that makes any difference. >> >>> Thanks for the suggestion. >> >>> >> >>> Nick >> >>> >> >>> >> >>> [1] >> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/00865 >> >>> 2.html >> >>> >> >>> > -----Original Message----- >> >>> > From: Dan van der Ster [mailto:[email protected]] >> >>> > Sent: 13 January 2017 10:28 >> >>> > To: Nick Fisk <[email protected]> >> >>> > Cc: ceph-users <[email protected]> >> >>> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during >> >>> > sleep? >> >>> > >> >>> > Hammer or jewel? I've forgotten which thread pool is handling the >> >>> > snap trim nowadays -- is it the op thread yet? If so, perhaps all >> >>> > the op threads are stuck sleeping? Just a wild guess. (Maybe >> >> increasing # >> >>> op threads would help?). >> >>> > >> >>> > -- Dan >> >>> > >> >>> > >> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]> wrote: >> >>> > > Hi, >> >>> > > >> >>> > > I had been testing some higher values with the >> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of >> >>> > > removing RBD snapshots on our cluster and I have come across >> >>> > > what I believe to be a possible unintended consequence. The >> >>> > > value of the sleep seems to keep the >> >>> > lock on the PG open so that no other IO can use the PG whilst the snap >> >>> > removal operation is sleeping. >> >>> > > >> >>> > > I had set the variable to 10s to completely minimise the impact >> >>> > > as I had some multi TB snapshots to remove and noticed that >> >>> > > suddenly all IO to the cluster had a latency of roughly 10s as >> >>> > > well, all the >> >>> > dumped ops show waiting on PG for 10s as well. >> >>> > > >> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be used >> >>> > > up to say a max of 0.1s and this is a known side effect, or >> >>> > > should the lock on the PG be removed so that normal IO can >> >>> > > continue during the >> >>> > sleeps? >> >>> > > >> >>> > > Nick >> >>> > > >> >>> > > _______________________________________________ >> >>> > > ceph-users mailing list >> >>> > > [email protected] >> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >>> _______________________________________________ >> >>> ceph-users mailing list >> >>> [email protected] >> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > _______________________________________________ >> > ceph-users mailing list >> > [email protected] >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
