Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)

Eric Blake Tue, 09 Oct 2018 07:59:21 -0700

On 10/9/18 8:29 AM, Nir Soffer wrote:

On Fri, Oct 5, 2018 at 7:58 AM Eric Blake <ebl...@redhat.com> wrote:

On 10/4/18 12:05 AM, Eric Blake wrote:

The following (long) email describes a portion of the work-flow of how
my proposed incremental backup APIs will work, along with the backend
QMP commands that each one executes.  I will reply to this thread with
further examples (the first example is long enough to be its own email).
This is an update to a thread last posted here:
https://www.redhat.com/archives/libvir-list/2018-June/msg01066.html

More to come in part 2.


- Second example: a sequence of incremental backups via pull model

In the first example, we did not create a checkpoint at the time of the
full pull. That means we have no way to track a delta of changes since
that point in time.



Why do we want to support backup without creating a checkpoint?

Fleecing. If you want to examine a portion of the disk at a given pointin time, then kicking off a pull model backup gives you access to thestate of the disk at that time, and your actions are transient. Endingthe job when you are done with the fleece cleans up everything needed toperform the fleece operation, and since you did not intend to capture afull (well, a complete) incremental backup, but were rather grabbingjust a subset of the disk, you really don't want that point in time tobe recorded as a new checkpoint.

Also, incremental backups (which are what require checkpoints) arelimited to qcow2 disks, but full backups can be performed on any format(including raw disks). If you have a guest that does not use qcow2disks, you can perform a full backup, but cannot create a checkpoint.


If we don't have any real use case, I suggest to always require a
checkpoint.


But we do have real cases for backup without checkpoint.

Let's repeat the full backup (reusing the same
backup.xml from before), but this time, we'll add a new parameter, a
second XML file for describing the checkpoint we want to create.

Actually, it was easy enough to get virsh to write the XML for me
(because it was very similar to existing code in virsh that creates XML
for snapshot creation):

$ $virsh checkpoint-create-as --print-xml $dom check1 testing \
     --diskspec sdc --diskspec sdd | tee check1.xml
<domaincheckpoint>
    <name>check1</name>


We should use an id, not a name, even of name is name is also unique like
in most libvirt apis.

In RHV we will use always use a UUID for this.

Nothing prevents you from using a UUID as your name. But this particularchoice of XML (<name>) matches what already exists in the snapshot XML.

    <description>testing</description>
    <disks>
      <disk name='sdc'/>
      <disk name='sdd'/>
    </disks>
</domaincheckpoint>

I had to supply two --diskspec arguments to virsh to select just the two
qcow2 disks that I am using in my example (rather than every disk in the
domain, which is the default when <disks> is not present).



So <disks /> is valid configuration, selecting all disks, or not having
"disks" element
selects all disks?

It's about a one-line change to get whichever behavior you find moreuseful. Right now, I'm leaning towards: <disks> omitted == backup alldisks, <disks> present: you MUST have at least one <disk> subelementthat explicitly requests a checkpoint (because any omitted <disk> when<disks> is present is skipped). A checkpoint only makes sense as long asthere is at least one disk to create a checkpoint with.

But I could also go with: <disks> omitted == backup all disks, <disks>present but <disk> subelements missing: the missing elements default tobeing backed up, and you have to explicitly provide <disk name='foo'checkpoint='no'> to skip a particular disk.

Or even: <disks> omitted, or <disks> present but <disk> subelementsmissing: the missing elements defer to the hypervisor for their defaultstate, and the qemu hypervisor defaults to qcow2 disks being backedup/checkpointed and to non-qcow2 disks being omitted. But this latterone feels like more magic, which is harder to document and liable to gowrong.

A stricter version would be <disks> is mandatory, and no <disk>subelement can be missing (or else the API fails because you weren'texplicit in your choice). But that's rather strict, especially sinceexisting snapshots XML handling is not that strict.

I also picked
a name (mandatory) and description (optional) to be associated with the
checkpoint.

The backup.xml file that we plan to reuse still mentions scratch1.img
and scratch2.img as files needed for staging the pull request. However,
any contents in those files could interfere with our second backup
(after all, every cluster written into that file from the first backup
represents a point in time that was frozen at the first backup; but our
second backup will want to read the data as the guest sees it now rather
than what it was at the first backup), so we MUST regenerate the scratch
files. (Perhaps I should have just deleted them at the end of example 1
in my previous email, had I remembered when typing that mail).

$ $qemu_img create -f qcow2 -b $orig1 -F qcow2 scratch1.img
$ $qemu_img create -f qcow2 -b $orig2 -F qcow2 scratch2.img

Now, to begin the full backup and create a checkpoint at the same time.
Also, this time around, it would be nice if the guest had a chance to
freeze I/O to the disks prior to the point chosen as the checkpoint.
Assuming the guest is trusted, and running the qemu guest agent (qga),
we can do that with:

$ $virsh fsfreeze $dom
$ $virsh backup-begin $dom backup.xml check1.xml
Backup id 1 started
backup used description from 'backup.xml'
checkpoint used description from 'check1.xml'
$ $virsh fsthaw $dom


Great, this answer my (unsent) question about freeze/thaw from part 1 :-)


and eventually, we may decide to add a VIR_DOMAIN_BACKUP_BEGIN_QUIESCE
flag to combine those three steps into a single API (matching what we've
done on some other existing API).  In other words, the sequence of QMP
operations performed during virDomainBackupBegin are quick enough that
they won't stall a freeze operation (at least Windows is picky if you
stall a freeze operation longer than 10 seconds).


We use fsFreeze/fsThaw directly in RHV since we need to support external
snapshots (e.g. ceph), so we don't need this functionality, but it sounds
good
idea to make it work like snapshot.

And indeed, since a future enhancement will be figuring out how we cancreate a checkpoint at the same time as a snapshot (as mentionedelsewhere in the email). A snapshot and checkpoint created at the sameatomic point should obviously both be able to happen at a quiescentpoint in guest I/O.


The tweaked $virsh backup-begin now results in a call to:
   virDomainBackupBegin(dom, "<domainbackup ...>",
     "<domaincheckpoint ...", 0)
and in turn libvirt makes a similar sequence of QMP calls as before,
with a slight modification in the middle:
{"execute":"nbd-server-start",...
{"execute":"blockdev-add",...


This does not work yet for network disks like "rbd" and "glusterfs"
does it mean that they will not be supported for backup?

Full backups can happen regardless of underlying format. But incrementalbackups require checkpoints, and checkpoints require qcow2 persistentbitmaps. As long as you have a qcow2 format on rbd or glusterfs, youshould be able to create checkpoints on that image, and thereforeperform incremental backups. Storage-wise, during a pull model backup,you would have your qcow2 format on remote glusterfs storage which iswhere the persistent bitmap is written, and temporarily also have ascratch qcow2 file on the local machine for performing copy-on-writeneeded to preserve the point in time semantics for as long as the backupoperation is running.

{"execute":"transaction",
   "arguments":{"actions":[
    {"type":"blockdev-backup", "data":{
     "device":"$node1", "target":"backup-sdc", "sync":"none",
     "job-id":"backup-sdc" }},
    {"type":"blockdev-backup", "data":{
     "device":"$node2", "target":"backup-sdd", "sync":"none",
     "job-id":"backup-sdd" }}
    {"type":"block-dirty-bitmap-add", "data":{
     "node":"$node1", "name":"check1", "persistent":true}},
    {"type":"block-dirty-bitmap-add", "data":{
     "node":"$node2", "name":"check1", "persistent":true}}
   ]}}
{"execute":"nbd-server-add",...



What if this sequence fail in the middle? will libvirt handle all failures
and rollback to the previous state?

What is the semantics of "execute": "transaction"? does it mean that qemu
will handle all possible failures in one of the actions?

qemu already promises that a "transaction" succeeds or fails as a group.As to other failures, the full recovery sequence is handled by libvirt,and looks like:


Fail on "nbd-server-start":
 - nothing to roll back
Fail on first "blockdev-add":
 - nbd-server-stop
Fail on subsequent "blockdev-add":
 - blockdev-remove on earlier scratch file additions
 - nbd-server-stop
Fail on any "block-dirty-bitmap-add" or "x-block-dirty-bitmap-merge":
 - block-dirty-bitmap-remove on any temporary bitmaps that were created
 - blockdev-remove on all scratch file additions
 - nbd-server-stop
Fail on "transaction":
 - block-dirty-bitmap-remove on all temporary bitmaps
 - blockdev-remove on all additions
 - nbd-server-stop
Fail on "nbd-server-add" or "x-nbd-server-add-bitmap":
 - if a checkpoint was attempted during "transaction":

-- perform x-block-dirty-bitmap-enable to re-enable bitmap that wasin use prior to transaction-- perform x-block-dirty-bitmap-merge to merge new bitmap intore-enabled bitmap

   -- perform block-dirty-bitmap-remove on the new bitmap
 - block-job-cancel
 - block-dirty-bitmap-remove on all temporary bitmaps
 - blockdev-remove on all scratch file additions
 - nbd-server-stop


More to come in part 3.

I still need to finish writing that, but part 3 will be a demonstrationof the push model (where qemu writes the backup to a given destination,without a scratch file, and without an NBD server, but where you arelimited to what qemu knows how to write).


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] Overview of libvirt incremental backup API, part 2 (incremental/differential pull mode)

Reply via email to