Mark McLoughlin wrote:
Hi Avi,

Sorry, I got distracted from this ...


So did I :)

  1) The length of the tx mitigation timer makes quite a difference to
     throughput achieved; we probably need a good heuristic for
     adjusting this on the fly.
The tx mitigation timer is just one part of the equation; the other is the virtio ring window size, which is now fixed.

Using a maximum sized window is good when the guest and host are running flat out, doing nothing but networking. When throughput drops (because the guest is spending cpu on processing, or simply because the other side is not keeping up), we need to drop the windows size so as to retain acceptable latencies.

The tx timer can then be set to "a bit after the end of the window", acting as a safety belt in case the throughput changes.

i.e. the tx timer should give just enough time for a flat out guest to
fill the ring, and no more?

Yep, that's basically what lguest's tx timer heuristic is aiming for
AFAICT.

Yes, but that's not enough. If networking is slow (for whatever reason) we need to drop the window size, to make sure the timer never fires under steady state circumstances.

Thinking about it, we could have an explicit "worst case latency" parameter (instead of the implicit "flat out guest fills ring") and set the timer to that. Adjust window size to as large as we can without seeing the timer expire.

  4) Dropping the global mutex while reading GSO packets from the tap
     interface gives a nice speedup. This highlights the global mutex
     as a general perfomance issue.

Not sure whether this is safe. What's stopping the guest from accessing virtio and changing some state?

With the current code, the virtio state should be consistent before we
drop the mutex. The I/O thread would only drop the lock while it reads
into the tap buffer and then grab the lock again before popping a buffer
from the ring and copying to it.


Right, tap_send() is called outside virtio-net context.

With Anthony's zero-copy patch, the situation is less clear - we pop a
buffer from the avail, drop the lock, read() into the buffer, grab the
lock and then push the buffer back onto the used ring. While the mutex
is released, the guest could e.g. reset the ring and release the buffer
which we're in the process of read()ing too.

So, yes - dropping the mutex during read() in the zero-copy patch isn't
safe.

Another potential concern is that if we drop the mutex, the guest thread
could delete an I/O handler while the I/O thread is in the I/O handler
loop in main_loop_wait(). However, this seems to have been coded to
handle this situation - the I/O handler would only be marked as deleted,
and ignored by the loop.

I think it's safe.  Still I don't feel good about it.

  5) Eliminating an extra copy on the host->guest path only makes a
     barely measurable difference.

That's expected on a host->guest test. Zero copy is mostly important for guest->external, and with zerocopy already enabled in the guest (sendfile or nfs server workloads).

Hmm, could you elaborate on that?

The copy we're eliminating here is an intermediate copy from tapfd into
a buffer before copying to a guest buffer. It doesn't give you zero-copy
as we still copy from kernel space to user space and vice-versa.

So long as we're eliminating intermediate copies, each elimination is not bring us much. It's the elimination of the last copy that brings the benefit (actually, the last copy for each separate L2 cache; need to test on loaded multisocket hosts).

There are broadly three workload categories wrt copying:

- server-side static http/nfs/smb protocols, serving to clients outside the host: guest is already copyless; serving from cache-cold buffers - normal network servers that actually process their data: you'll have a guest kernel/user copy, and in any case, guest protocol processing means that the potential for gains from eliminating copies is limited - guest/host benchmarks, which do a copy from a single buffer which is always in L1: zero copy is not going to show a gain (perhaps the opposite)

To get the first workload type optimized, we have to get the entire path copyless.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to