On Tue, Jul 27, 2021 at 12:16:59PM +0100, Richard W.M. Jones wrote:
Hi Eric, a couple of questions below about nbdkit performance.

Modular virt-v2v will use disk pipelines everywhere.  The input
pipeline looks something like this:

 socket <- cow filter <- cache filter <-   nbdkit
                                          curl|vddk

We found there's a notable slow down in at least one case: When the
source plugin is very slow (eg. it's curl plugin to a slow and remote
website, or VDDK in general), everything runs very slowly.

I made a simple test case to demonstrate this:

$ virt-builder fedora-33
$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img 
delay-read=500ms --run 'virt-inspector --format=raw -a "$uri" -vx'

This uses a local file with the delay filter on top injecting half
second delays into every read.  It "feels" a lot like the slow case we
were observing.  Virt-v2v also does inspection as a first step when
converting an image, so using virt-inspector is somewhat realistic.

Unfortunately this actually runs far too slowly for me to wait around
- at least 30 mins, and probably a lot longer.  This compares to only
7 seconds if you remove the delay filter.

Reducing the delay to 50ms means at least it finishes in a reasonable time:

$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \
    delay-read=50ms \
    --run 'virt-inspector --format=raw -a "$uri"'

real    5m16.298s
user    0m0.509s
sys     0m2.894s

In the above scenario the cache filter is not actually doing anything
(since virt-inspector does not write).  Adding cache-on-read=true lets
us cache the reads, avoiding going through the "slow" plugin in many
cases, and the result is a lot better:

$ time ./nbdkit --filter=cache --filter=delay file /var/tmp/fedora-33.img \
    delay-read=50ms cache-on-read=true \
    --run 'virt-inspector --format=raw -a "$uri"'

real    0m27.731s
user    0m0.304s
sys     0m1.771s

However this is still slower than the old method which used qcow2 +
qemu's copy-on-read.  It's harder to demonstrate this, but I modified
virt-inspector to use the copy-on-read setting (which it doesn't do
normally).  On top of nbdkit with 50ms delay and no other filters:

qemu + copy-on-read backed by nbdkit delay-read=50ms file:
real    0m23.251s

So 23s is the time to beat.  (I believe that with longer delays, the
gap between qemu and nbdkit increases in favour of qemu.)

Q1: What other ideas could we explore to improve performance?


First thing that came to mind: Could it be that QEMU's cache-on-read
caches maybe bigger blocks making it effectively do some small
read-ahead as well?

- - -

In real scenarios we'll actually want to combine cow + cache, where
cow is caching writes, and cache is caching reads.

 socket <- cow filter <- cache filter   <-  nbdkit
                      cache-on-read=true   curl|vddk

The cow filter is necessary to prevent changes being written back to
the pristine source image.

This is actually surprisingly efficient, making no noticable
difference in this test:

time ./nbdkit --filter=cow --filter=cache --filter=delay \
    file /var/tmp/fedora-33.img \
    delay-read=50ms cache-on-read=true \
    --run 'virt-inspector --format=raw -a "$uri"'

real    0m27.193s
user    0m0.283s
sys     0m1.776s

Q2: Should we consider a "cow-on-read" flag to the cow filter (thus
removing the need to use the cache filter at all)?


That would make at least some sense since there is cow-on-cache already
(albeit a little confusing for me personally).  I presume it would not
increase the size of the difference (when using qemu-img rebase) at all,
right?  I do not see however how it would be faster than the existing:

  cow <- cache[cache-on-read]

Martin


Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

_______________________________________________
Libguestfs mailing list
[email protected]
https://listman.redhat.com/mailman/listinfo/libguestfs

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Libguestfs mailing list
[email protected]
https://listman.redhat.com/mailman/listinfo/libguestfs

Reply via email to