Hammer or jewel? I've forgotten which thread pool is handling the snap trim nowadays -- is it the op thread yet? If so, perhaps all the op threads are stuck sleeping? Just a wild guess. (Maybe increasing # op threads would help?).
-- Dan On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <[email protected]> wrote: > Hi, > > I had been testing some higher values with the osd_snap_trim_sleep variable > to try and reduce the impact of removing RBD snapshots > on our cluster and I have come across what I believe to be a possible > unintended consequence. The value of the sleep seems to keep > the lock on the PG open so that no other IO can use the PG whilst the snap > removal operation is sleeping. > > I had set the variable to 10s to completely minimise the impact as I had some > multi TB snapshots to remove and noticed that suddenly > all IO to the cluster had a latency of roughly 10s as well, all the dumped > ops show waiting on PG for 10s as well. > > Is the osd_snap_trim_sleep variable only ever meant to be used up to say a > max of 0.1s and this is a known side effect, or should > the lock on the PG be removed so that normal IO can continue during the > sleeps? > > Nick > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
