We're on Jewel and your right, I'm pretty sure the snap stuff is also now handled in the op thread.
The dump historic ops socket command showed a 10s delay at the "Reached PG" stage, from Greg's response [1], it would suggest that the OSD itself isn't blocking but the PG it's currently sleeping whilst trimming. I think in the former case, it would have a high time on the "Started" part of the op? Anyway I will carry out some more testing with higher osd op threads and see if that makes any difference. Thanks for the suggestion. Nick [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008652.html > -----Original Message----- > From: Dan van der Ster [mailto:[email protected]] > Sent: 13 January 2017 10:28 > To: Nick Fisk <[email protected]> > Cc: ceph-users <[email protected]> > Subject: Re: [ceph-users] osd_snap_trim_sleep keeps locks PG during sleep? > > Hammer or jewel? I've forgotten which thread pool is handling the snap trim > nowadays -- is it the op thread yet? If so, perhaps all the > op threads are stuck sleeping? Just a wild guess. (Maybe increasing # op > threads would help?). > > -- Dan > > > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]> wrote: > > Hi, > > > > I had been testing some higher values with the osd_snap_trim_sleep > > variable to try and reduce the impact of removing RBD snapshots on our > > cluster and I have come across what I believe to be a possible unintended > > consequence. The value of the sleep seems to keep the > lock on the PG open so that no other IO can use the PG whilst the snap > removal operation is sleeping. > > > > I had set the variable to 10s to completely minimise the impact as I > > had some multi TB snapshots to remove and noticed that suddenly all IO to > > the cluster had a latency of roughly 10s as well, all the > dumped ops show waiting on PG for 10s as well. > > > > Is the osd_snap_trim_sleep variable only ever meant to be used up to > > say a max of 0.1s and this is a known side effect, or should the lock on > > the PG be removed so that normal IO can continue during the > sleeps? > > > > Nick > > > > _______________________________________________ > > ceph-users mailing list > > [email protected] > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
