Hammer or jewel? I've forgotten which thread pool is handling the snap
trim nowadays -- is it the op thread yet? If so, perhaps all the op
threads are stuck sleeping? Just a wild guess. (Maybe increasing # op
threads would help?).

-- Dan


On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <n...@fisk.me.uk> wrote:
> Hi,
>
> I had been testing some higher values with the osd_snap_trim_sleep variable 
> to try and reduce the impact of removing RBD snapshots
> on our cluster and I have come across what I believe to be a possible 
> unintended consequence. The value of the sleep seems to keep
> the lock on the PG open so that no other IO can use the PG whilst the snap 
> removal operation is sleeping.
>
> I had set the variable to 10s to completely minimise the impact as I had some 
> multi TB snapshots to remove and noticed that suddenly
> all IO to the cluster had a latency of roughly 10s as well, all the dumped 
> ops show waiting on PG for 10s as well.
>
> Is the osd_snap_trim_sleep variable only ever meant to be used up to say a 
> max of 0.1s and this is a known side effect, or should
> the lock on the PG be removed so that normal IO can continue during the 
> sleeps?
>
> Nick
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to