On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <a...@grinser.de> wrote: > > hi, > > im currently looking at the new incremental backup api that has been > part of the 4.4 and RHV 4.4-beta release. So far i was able to create > full/incremental backups and restore without any problem. > > Now, using the backup_vm.py example from the ovirt-engine-sdk i get > the following is happening during a full backup: > > 1) imageio client api requests transfer > 2) starts qemu-img to create a local qemu image with same size > 3) starts qemu-nbd to serve this image > 4) reads used extents from provided imageio source, passes data to > qemu-nbd process > 5) resulting file is a thin provisioned qcow image with the actual > data of the VM's used space. > > while this works great, it has one downside: if i backup a virtual > machine with lots of used extents, or multiple virtual machines at the > same time, i may run out of space, if my primary backup target is > not a regular disk. > > Imagine i want to stream the FULL backup to tape directly like > > backup_vm.py full [..] <vm_uuid> /dev/nst0 > > thats currently not possible, because qemu-img is not able to open > a tape device directly, given its nature of the qcow2 format. > > So what iam basically looking for, is a way to download only the extents > from the imageio server that are really in use, not depending on qemu-* > tools, to be able to pipe the data somehwere else. > > Standard tools, like for example curl, will allways download the full > provisioned image from the imageio backend (of course). > > I noticed is that it is possible to query the extents via: > > > https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/extents > > As i failed to find them, are there any existing functions/api calls > that could be used to download only the used extents to a file/fifo > pipe? > > So far, i played around with the _internal.io.copy function, beeing able > to at least read the data into a in memory BytesIO stream, but thats not > the solution to my "problem" :)
To use _internal.io.copy to copy the image to tape, we need to solve several issues: 1. how do you write the extents to tape so that you can extract them later? 2. provide a backend that knows how to stream data to tape in the right format 3. fix client.download() to consider the number of writers allowed by the backend, since streaming to tape using multiple writers will not be possible. I think we can start with a simple implementation using imageio API, and once we have a working solution, we can consider making a backend. A possible solution for 1 is to use tar format, creating one tar per backup. The tar structure can be: - backup info - json file with information about this backup like vm id, disk id, date, checkpoint, etc. - extents - the json returned from imageio as is. Using this json you can restore later every extent to the right location in the restored image - extent 1 - first data extent (zero=False) ... - extent N - last data extent To restore this backup, you need to: 1. find the tar in the tape (I have no idea how you would do this) 2. extract backup info from the tar 3. extract extents from the tar 4. start an upload transfer 5. for each data extent: read data from the tar member, and send to imageio using the right offset and size Other formats are possible, but reusing tar seems like the easiest way and will make it easier to write and read backups from tapes. Creating a tar file and adding items using streaming can be done like this: with tarfile.open("/dev/xxx", "w|") as tar: # Create tarinfo for extent-N # setting other attributes may be needed tarinfo = tarfile.Tarinfo("extent-{}".format(extent_number)) tarinfo.size = extent_size # reader must implement read(n), providing tarinfo.size bytes. tar.addfile(tarinfo, fileObj=reader) I never tried to write directly to tape with python tarfile, but it should work. So the missing part is to create a connection to imageio and reading the data. The easiest way is to use imageio._internal.backends.http, but note that this is internal now, so you should not use it outside of imageio. It is fine for writing proof of concept, and if you can show a good use case we can work on public API. With that backend, you can do this: from imageio._internal.backends impot http with http.Backend(transfer_url, cafile) as backend: extents = list(backend.extents("zero")) # Write extents to tarfile. Assuming you wrote a helper write_to_tar() # doing the Tarinfo dance. extents_data = json.dumps([extent.to_dict() for extent in extents]) write_to_tar("extents", len(extent_data), io.BytesIO(extents_data)) for n, extent in enumerate(e for e in extents if not e.zero): # Seek to start of extent. Reading extent.length bytes will # return extent data. backend.seek(extent.start) # Backends do not implement read() and it would be inefficient to # implement read. This is a quick hack to make it possible to integrate # other code expecting file-like objects. # reader is http.HTTPResponse() instance, implementing read(). reader = backend._get(extent.length) write_to_tar("extent-{}".format(n), extent.length, reader) For incremental backup, you will need to change: extents = list(backend.extents("dirty")) ... for n, extent in enumerate(e for e in extents if e.dirty): You can write this using http.client.HTTPSConnection without using the http backend, but it would be a lot of code. We probably need to expose the backends or a simplified interface in the client public API to make it easier to write such applications. Maybe something like: client.coy(src, dst) Where src and dst are objects implementing imageio backend interface. But before we do this we need to see some examples of real programs using imageio, to understand the requirements better. Nir _______________________________________________ Devel mailing list -- devel@ovirt.org To unsubscribe send an email to devel-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/devel@ovirt.org/message/HS73YMKIHSED4AQT4RKWGGEYAU7KTJWG/