[ovirt-devel] Re: Backup: how to download only used extents from imageio backend

Michael Ablassmeier Tue, 30 Jun 2020 12:16:26 -0700

hi,

On Tue, Jun 30, 2020 at 04:49:01PM +0300, Nir Soffer wrote:
> On Tue, Jun 30, 2020 at 10:32 AM Michael Ablassmeier <a...@grinser.de> wrote:
> >  
> > https://tranfer_node:54322/images/d471c659-889f-4e7f-b55a-a475649c48a6/extents
> >
> > As i failed to find them, are there any existing functions/api calls
> > that could be used to download only the used extents to a file/fifo
> > pipe?
> 
> To use _internal.io.copy to copy the image to tape, we need to solve
> several issues:
> 
> 1. how do you write the extents to tape so that you can extract them later?
> 2. provide a backend that knows how to stream data to tape in the right format
> 3. fix client.download() to consider the number of writers allowed by
> the backend,
>    since streaming to tape using multiple writers will not be possible.


so, speaking as someone who works for a backup vendor, issue  1 and 2 are
already solved by our software, the backend is there, we just need an
way to extract the data from the api without storing it into a file
first. Something like:

 backup_vm.py full <vm_uuid> pipe

is already sufficient, as our backup client software would simply read
the data from the pipe, sending it to our backend which does all the
stuff regarding tape communication and format.

The old implementation used the snapshot/attach feature, where our
backup client is reading directly from the attached storage device,
sending the data to the backend, which cares about multiplexing to tape,
possible dedpulication, etc..

Tape is not the only use case here, most of the times our customers want
to write data to storage devices which do not expose an regular file
system (such as dedup services, StoreOnce, Virtual Tape solutions etc).

> To restore this backup, you need to:
> 1. find the tar in the tape (I have no idea how you would do this)
> 2. extract backup info from the tar
> 3. extract extents from the tar

1-3 are not an issue here and handled by our backend

> 4. start an upload transfer
> 5. for each data extent:
>     read data from the tar member, and send to imageio using the right
>     offset and size 

that is some good information, so it is possible to create an empty disk
with the same size using the API and then directly send the extents with
their propper offset. How does it look with an incremental backup on top
of an just restored full backup. Does the imageio backend automatically
rebase and commit the data from the incremental backup during upload?

As i understand it, requesting the extents directly and writing them to
a file, leaves you with an image in raw format, which then needs to be
properly re-aligned with zeros and converted to qcow2, beeing able to
commit any of the incremental backups i have stored somewhere. As during
upload, an convert is possible, that means we dont have to rebuild the
full/inc chain using a temporary file which we then upload?

> So the missing part is to create a connection to imageio and reading the data.
> 
> The easiest way is to use imageio._internal.backends.http, but note that this
> is internal now, so you should not use it outside of imageio. It is fine for
> writing proof of concept, and if you can show a good use case we can work
> on public API.

yes, that is what i noticed. My current solution would be to use the
interal functions to query the extent information and then continue
extracting them, to be able to pipe the data into our backend.

> You can write this using http.client.HTTPSConnection without using
> the http backend, but it would be a lot of code.

thanks for your example, i will give it a try during POC implementation.

> We probably need to expose the backends or a simplified interface
> in the client public API to make it easier to write such applications.
> 
> Maybe something like:
> 
>      client.coy(src, dst)
> 
> Where src and dst are objects implementing imageio backend interface.
> 
> But before we do this we need to see some examples of real programs
> using imageio, to understand the requirements better.

the main feature for us would be to be able to read the data and
pipe it somewhere, which works by using the _internal api
functions, but having a stable interface for it would be really
good for any kind of backup vendor to implement a client for
the new api into their software.

If anyone is interested to hear more thoughts about that, also from
redhat, dont hesitate to contact me directly for having a call.

bye,
    - michael
_______________________________________________
Devel mailing list -- devel@ovirt.org
To unsubscribe send an email to devel-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/devel@ovirt.org/message/QLIGGZGCN5XCKEPDOLXO3IM3TCQPKKFY/

[ovirt-devel] Re: Backup: how to download only used extents from imageio backend

Reply via email to