Hammer or jewel? I've forgotten which thread pool is handling the snap
trim nowadays -- is it the op thread yet? If so, perhaps all the op
threads are stuck sleeping? Just a wild guess. (Maybe increasing # op
threads would help?).
On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> I had been testing some higher values with the osd_snap_trim_sleep variable
> to try and reduce the impact of removing RBD snapshots
> on our cluster and I have come across what I believe to be a possible
> unintended consequence. The value of the sleep seems to keep
> the lock on the PG open so that no other IO can use the PG whilst the snap
> removal operation is sleeping.
> I had set the variable to 10s to completely minimise the impact as I had some
> multi TB snapshots to remove and noticed that suddenly
> all IO to the cluster had a latency of roughly 10s as well, all the dumped
> ops show waiting on PG for 10s as well.
> Is the osd_snap_trim_sleep variable only ever meant to be used up to say a
> max of 0.1s and this is a known side effect, or should
> the lock on the PG be removed so that normal IO can continue during the
> ceph-users mailing list
ceph-users mailing list