Hi,

I had been testing some higher values with the osd_snap_trim_sleep variable to 
try and reduce the impact of removing RBD snapshots
on our cluster and I have come across what I believe to be a possible 
unintended consequence. The value of the sleep seems to keep
the lock on the PG open so that no other IO can use the PG whilst the snap 
removal operation is sleeping.

I had set the variable to 10s to completely minimise the impact as I had some 
multi TB snapshots to remove and noticed that suddenly
all IO to the cluster had a latency of roughly 10s as well, all the dumped ops 
show waiting on PG for 10s as well.

Is the osd_snap_trim_sleep variable only ever meant to be used up to say a max 
of 0.1s and this is a known side effect, or should
the lock on the PG be removed so that normal IO can continue during the sleeps?

Nick

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to