Hi Adrian,

I have also hit this recently and have since increased the osd_snap_trim_sleep 
to try and stop this from happening again. However, I
haven't had an opportunity to actually try and break it again yet, but your 
mail seems to suggest it might not be the silver bullet
I was looking for.

I'm wondering if the problem is not with the removal of the snapshot, but 
actually down to the amount of object deletes that happen,
as I see similar results when doing fstrim's or deleting RBD's. Either way I 
agree that a settable throttle to allow it to process
more slowly would be a good addition. Have you tried that value set to higher 
than 1, maybe 10?


> -----Original Message-----
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
> Adrian Saul
> Sent: 22 September 2016 05:19
> To: 'ceph-users@lists.ceph.com' <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] Snap delete performance impact
> Any guidance on this?  I have osd_snap_trim_sleep set to 1 and it seems to 
> have tempered some of the issues but its still bad
> that NFS storage off RBD volumes become unavailable for over 3 minutes.
> It seems that the activity which the snapshot deletes are actioned triggers 
> massive disk load for around 30 minutes.  The logs
> OSDs marking each other out, OSDs complaining they are wrongly marked out and 
> blocked requests errors for around 10 minutes at
> the start of this activity.
> Is there any way to throttle snapshot deletes to make them much more of a 
> background activity?  It really should not make the
> platform unusable for 10 minutes.
> > -----Original Message-----
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Adrian Saul
> > Sent: Wednesday, 6 July 2016 3:41 PM
> > To: 'ceph-users@lists.ceph.com'
> > Subject: [ceph-users] Snap delete performance impact
> >
> >
> > I recently started a process of using rbd snapshots to setup a backup
> > regime for a few file systems contained in RBD images.  While this
> > generally works well at the time of the snapshots there is a massive
> > increase in latency (10ms to multiple seconds of rbd device latency)
> > across the entire cluster.  This has flow on effects for some cluster
> > timeouts as well as general performance hits to applications.
> >
> > In research I have found some references to osd_snap_trim_sleep being the
> > way to throttle this activity but no real guidance on values for it.   I 
> > also see
> > some other osd_snap_trim tunables  (priority and cost).
> >
> > Is there any recommendations around setting these for a Jewel cluster?
> >
> > cheers,
> >  Adrian
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> Confidentiality: This email and any attachments are confidential and may be 
> subject to copyright, legal or some other professional
> privilege. They are intended solely for the attention and use of the named 
> addressee(s). They may only be copied, distributed or
> disclosed with the consent of the copyright owner. If you have received this 
> email by mistake or by breach of the confidentiality
> clause, please notify the sender immediately by return email and delete or 
> destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived or lost because this email has been sent 
> to you by mistake.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

ceph-users mailing list

Reply via email to