Mark McLoughlin wrote:
On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote:
Mark McLoughlin wrote:
Hey,

The main patch in this series is 5/6 - it just kills off the
virtio_net tx mitigation timer and does all the tx I/O in the
I/O thread.

What will it do to small packet, multi-flow loads (simulated by ping -f -l 30 $external)?

It should improve the latency - the packets will be flushed more quickly
than the 150us timeout without blocking the guest.


But it will increase overhead, since suddenly we aren't queueing anymore. One vmexit per small packet.


Where does the benefit come from?

There are two things going on here, I think.

First is that the timer affects latency, removing the timeout helps
that.

If the timer affects latency, then something is very wrong. We're lacking an adjustable window.

The way I see it, the notification window should be adjusted according to the current workload. If the link is idle, the window should be one packet -- notify as soon as something is queued. As the workload increases, the window increases to (safety_factor * allowable_latency / packet_rate). The timer is set to allowable_latency to catch changes in workload.

For example:

- allowable_latency 1ms (implies 1K vmexits/sec desired)
- current packet_rate 20K packets/sec
- safety_factor 0.8

So we request notifications every 0.8 * 20K * 1m = 16 packets, and set the timer to 1ms. Usually we get a notification every 16 packets, just before timer expiration. If the workload increases, we get notifications sooner, so we increase the window. If the workload drops, the timer fires and we decrease the window.

The timer should never fire on an all-out benchmark, or in a ping test.

Second is that currently when we fill up the ring we block the guest
vcpu and flush. Thus, while we're copying a entire ring full of packets
that guest isn't making progress. Doing the copying in the I/O thread
helps there.

We're hurting our cache, and this won't work well with many nics. At the very least this should be done in a dedicated thread. It's also going to damage latency.

The only real fix is to avoid the copy altogether.

Note - the only net I/O we currently do in the vcpu thread at the moment
is when the guest is saturating the link. Any other timer, all the I/O
is done in the I/O thread by virtue of the timer.

This is fundamental brokenness, as mentioned above, in my non-networking-expert opinion.

Is the overhead of managing the timer too high, or does it fire too
late and so we sleep?  If the latter, can we tune it dynamically?

For example, if the guest sees it is making a lot of progress without the host catching up (waiting on the tx timer), it can kick_I_really_mean_this_now(), to get the host to notice.

It does that already - if the ring fills up the guests forces a kick
which causes the host to flush the ring in the vcpu thread.

Should happen some time before the ring fills up. Especially if we make the flushing aync by offloading to some other thread.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to