Re: [ceph-users] Snap delete performance impact

Adrian Saul Thu, 22 Sep 2016 02:15:50 -0700

I tried 2 this afternoon and saw the same results.  Essentially the disks 
appear to go to 100% busy doing very small but high numbers of IO and incur 
massive service times (300-400ms).   During that period I get blocked request 
errors continually.


I suspect part of that might be the SATA servers had filestore_op_threads set 
too high and hammering the disks with too much concurrent work.  As they have 
inherited a setting targeted for SSDs, so I have wound that back to defaults on 
those machines see if it makes a difference.

But I suspect going by the disk activity there is a lot of very small FS 
metadata updates going on and that is what is killing it.

Cheers,
 Adrian


> -----Original Message-----
> From: Nick Fisk [mailto:[email protected]]
> Sent: Thursday, 22 September 2016 7:06 PM
> To: Adrian Saul; [email protected]
> Subject: RE: Snap delete performance impact
>
> Hi Adrian,
>
> I have also hit this recently and have since increased the
> osd_snap_trim_sleep to try and stop this from happening again. However, I
> haven't had an opportunity to actually try and break it again yet, but your
> mail seems to suggest it might not be the silver bullet I was looking for.
>
> I'm wondering if the problem is not with the removal of the snapshot, but
> actually down to the amount of object deletes that happen, as I see similar
> results when doing fstrim's or deleting RBD's. Either way I agree that a
> settable throttle to allow it to process more slowly would be a good addition.
> Have you tried that value set to higher than 1, maybe 10?
>
> Nick
>
> > -----Original Message-----
> > From: ceph-users [mailto:[email protected]] On Behalf
> > Of Adrian Saul
> > Sent: 22 September 2016 05:19
> > To: '[email protected]' <[email protected]>
> > Subject: Re: [ceph-users] Snap delete performance impact
> >
> >
> > Any guidance on this?  I have osd_snap_trim_sleep set to 1 and it
> > seems to have tempered some of the issues but its still bad
> enough
> > that NFS storage off RBD volumes become unavailable for over 3 minutes.
> >
> > It seems that the activity which the snapshot deletes are actioned
> > triggers massive disk load for around 30 minutes.  The logs
> show
> > OSDs marking each other out, OSDs complaining they are wrongly marked
> > out and blocked requests errors for around 10 minutes at the start of this
> activity.
> >
> > Is there any way to throttle snapshot deletes to make them much more
> > of a background activity?  It really should not make the
> entire
> > platform unusable for 10 minutes.
> >
> >
> >
> > > -----Original Message-----
> > > From: ceph-users [mailto:[email protected]] On
> > > Behalf Of Adrian Saul
> > > Sent: Wednesday, 6 July 2016 3:41 PM
> > > To: '[email protected]'
> > > Subject: [ceph-users] Snap delete performance impact
> > >
> > >
> > > I recently started a process of using rbd snapshots to setup a
> > > backup regime for a few file systems contained in RBD images.  While
> > > this generally works well at the time of the snapshots there is a
> > > massive increase in latency (10ms to multiple seconds of rbd device
> > > latency) across the entire cluster.  This has flow on effects for
> > > some cluster timeouts as well as general performance hits to applications.
> > >
> > > In research I have found some references to osd_snap_trim_sleep being
> the
> > > way to throttle this activity but no real guidance on values for it.   I 
> > > also
> see
> > > some other osd_snap_trim tunables  (priority and cost).
> > >
> > > Is there any recommendations around setting these for a Jewel cluster?
> > >
> > > cheers,
> > >  Adrian
> > > _______________________________________________
> > > ceph-users mailing list
> > > [email protected]
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > Confidentiality: This email and any attachments are confidential and
> > may be subject to copyright, legal or some other professional
> > privilege. They are intended solely for the attention and use of the
> > named addressee(s). They may only be copied, distributed or disclosed
> > with the consent of the copyright owner. If you have received this email by
> mistake or by breach of the confidentiality clause, please notify the sender
> immediately by return email and delete or destroy all copies of the email.
> Any confidentiality, privilege or copyright is not waived or lost because this
> email has been sent to you by mistake.
> > _______________________________________________
> > ceph-users mailing list
> > [email protected]
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Snap delete performance impact

Reply via email to