Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the
default value, equal to a 16MB IO) and
osd_pg_max_concurrent_snap_trims to 1 (from 2)?
-Sam

On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <[email protected]> wrote:
> Hi Sam,
>
> Thanks for the confirmation on both which thread the trimming happens in and 
> for confirming my suspicion that sleeping is now a bad idea.
>
> The problem I see is that even with setting the priority for trimming down 
> low, it still seems to completely swamp the cluster. The trims seem to get 
> submitted in an async nature which seems to leave all my disks sitting at 
> queue depths of 50+ for several minutes until the snapshot is removed, often 
> also causing several OSD's to get marked out and start flapping. I'm using 
> WPQ but haven't changed the cutoff variable yet as I know you are working on 
> fixing a bug with that.
>
> Nick
>
>> -----Original Message-----
>> From: Samuel Just [mailto:[email protected]]
>> Sent: 19 January 2017 15:47
>> To: Dan van der Ster <[email protected]>
>> Cc: Nick Fisk <[email protected]>; ceph-users <[email protected]>
>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep?
>>
>> Snaptrimming is now in the main op threadpool along with scrub, recovery, 
>> and client IO.  I don't think it's a good idea to use any of the
>> _sleep configs anymore -- the intention is that by setting the priority low, 
>> they won't actually be scheduled much.
>> -Sam
>>
>> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster <[email protected]> 
>> wrote:
>> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <[email protected]> wrote:
>> >> Hi Dan,
>> >>
>> >> I carried out some more testing after doubling the op threads, it may
>> >> have had a small benefit as potentially some threads are available,
>> >> but latency still sits more or less around the configured snap sleep 
>> >> time. Even more threads might help, but I suspect you are just
>> lowering the chance of IO's that are stuck behind the sleep, rather than 
>> actually solving the problem.
>> >>
>> >> I'm guessing when the snap trimming was in disk thread, you wouldn't
>> >> have noticed these sleeps, but now it's in the op thread it will just sit 
>> >> there holding up all IO and be a lot more noticable. It might be
>> that this option shouldn't be used with Jewel+?
>> >
>> > That's a good thought -- so we need confirmation which thread is doing
>> > the snap trimming. I honestly can't figure it out from the code --
>> > hopefully a dev could explain how it works.
>> >
>> > Otherwise, I don't have much practical experience with snap trimming
>> > in jewel yet -- our RBD cluster is still running 0.94.9.
>> >
>> > Cheers, Dan
>> >
>> >
>> >>
>> >>> -----Original Message-----
>> >>> From: ceph-users [mailto:[email protected]] On
>> >>> Behalf Of Nick Fisk
>> >>> Sent: 13 January 2017 20:38
>> >>> To: 'Dan van der Ster' <[email protected]>
>> >>> Cc: 'ceph-users' <[email protected]>
>> >>> Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during 
>> >>> sleep?
>> >>>
>> >>> We're on Jewel and your right, I'm pretty sure the snap stuff is also 
>> >>> now handled in the op thread.
>> >>>
>> >>> The dump historic ops socket command showed a 10s delay at the
>> >>> "Reached PG" stage, from Greg's response [1], it would suggest that
>> >>> the OSD itself isn't blocking but the PG it's currently sleeping
>> >>> whilst trimming. I think in the former case, it would have a
>> >> high time
>> >>> on the "Started" part of the op? Anyway I will carry out some more
>> >>> testing with higher osd op threads and see if that makes any difference. 
>> >>> Thanks for the suggestion.
>> >>>
>> >>> Nick
>> >>>
>> >>>
>> >>> [1]
>> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/00865
>> >>> 2.html
>> >>>
>> >>> > -----Original Message-----
>> >>> > From: Dan van der Ster [mailto:[email protected]]
>> >>> > Sent: 13 January 2017 10:28
>> >>> > To: Nick Fisk <[email protected]>
>> >>> > Cc: ceph-users <[email protected]>
>> >>> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during 
>> >>> > sleep?
>> >>> >
>> >>> > Hammer or jewel? I've forgotten which thread pool is handling the
>> >>> > snap trim nowadays -- is it the op thread yet? If so, perhaps all
>> >>> > the op threads are stuck sleeping? Just a wild guess. (Maybe
>> >> increasing #
>> >>> op threads would help?).
>> >>> >
>> >>> > -- Dan
>> >>> >
>> >>> >
>> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]> wrote:
>> >>> > > Hi,
>> >>> > >
>> >>> > > I had been testing some higher values with the
>> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of
>> >>> > > removing RBD snapshots on our cluster and I have come across
>> >>> > > what I believe to be a possible unintended consequence. The
>> >>> > > value of the sleep seems to keep the
>> >>> > lock on the PG open so that no other IO can use the PG whilst the snap 
>> >>> > removal operation is sleeping.
>> >>> > >
>> >>> > > I had set the variable to 10s to completely minimise the impact
>> >>> > > as I had some multi TB snapshots to remove and noticed that
>> >>> > > suddenly all IO to the cluster had a latency of roughly 10s as
>> >>> > > well, all the
>> >>> > dumped ops show waiting on PG for 10s as well.
>> >>> > >
>> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be used
>> >>> > > up to say a max of 0.1s and this is a known side effect, or
>> >>> > > should the lock on the PG be removed so that normal IO can
>> >>> > > continue during the
>> >>> > sleeps?
>> >>> > >
>> >>> > > Nick
>> >>> > >
>> >>> > > _______________________________________________
>> >>> > > ceph-users mailing list
>> >>> > > [email protected]
>> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> [email protected]
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> > _______________________________________________
>> > ceph-users mailing list
>> > [email protected]
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to