Snapshots are not a free action.  To create them it's near enough free, but
deleting objects in Ceph is an n^2 operation.  Being on Hammer you do not
have access to the object map feature on RBDs which drastically reduces the
n^2 problem by keeping track of which objects it actually needs to delete.
For your week old snapshot the cluster is needing to throw every object for
the snapshot (whether it exists or not) into the snap_trim_q to be
deleted.  So what n^2 means, if you aren't familiar, is that if a 1GB
volume/snapshot takes 4 minutes to delete, then a 2GB volume takes 16
minutes.

Peter mentioned the setting that was implemented in Hammer and is the ONLY
setting in Hammer that can help with snapshot deletions that thrash your
cluster.  You NEED to use osd_snap_trim_sleep.  Jewel broke that without
properly implementing adequate work-arounds for the setting, but Jewel is
back on track now.  I would recommend an osd_snap_trim_sleep of about .05
to start with to see if that alleviates your pressure.  It was a bad
solution to fix a problem quickly that they've finally revisited to address
it properly.  What it does is every time it deletes snap shot objects, it
sleeps for .05 seconds and then does the next one.  In Jewel that was
broken because they moved snap shot deletions into the main op thread and
the snap trim sleep just put a sleep onto the main op thread telling the
osd thread to do nothing after deleting a snap trim object.

Upgrading to Jewel and enabling object_map on all of your rbds would help
this problem as well as researching the new options in Jewel to fine-tune
snap trim settings for your environment and hardware.  I personally still
just use a small osd_snap_trim_sleep on my 3 node proxmox cluster and it
works fine.  I don't get slow requests when I delete snapshots.  I used to
before putting in a little snap trim sleep.  I only create snapshots about
once/mo and cycle out the old ones, but it works well for me.

On Mon, Jun 26, 2017 at 8:07 AM Lindsay Mathieson <
[email protected]> wrote:

> On 26/06/2017 7:36 PM, Marco Gaiarin wrote:
> > Last week i've used by the first time the snapshot feature. I've done
> > some test, before, on some ''spare'' VM doing snapshot on a powered off
> > VM (as expected, was merely istantaneus) and on a powered on one
> > (clearly, snapshotting the RAM pose some stress on that VM, but not so
> > much on the overral system, as expected).
> > I've also do some test of deleting the snapshot created, but some
> > minute after i've done that snapshot, and nothing relevant happens.
>
>
>
> Have you tried restoring a snapshot? I found it unusablly slow - as in
> hours
>
> --
> Lindsay Mathieson
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to