Mark McLoughlin wrote:
On Sun, 2008-11-02 at 11:48 +0200, Avi Kivity wrote:
Mark McLoughlin wrote:
Hey,
The main patch in this series is 5/6 - it just kills off the
virtio_net tx mitigation timer and does all the tx I/O in the
I/O thread.
What will it do to small packet, multi-flow loads (simulated by ping -f
-l 30 $external)?
It should improve the latency - the packets will be flushed more quickly
than the 150us timeout without blocking the guest.
But it will increase overhead, since suddenly we aren't queueing
anymore. One vmexit per small packet.
Where does the benefit come from?
There are two things going on here, I think.
First is that the timer affects latency, removing the timeout helps
that.
If the timer affects latency, then something is very wrong. We're
lacking an adjustable window.
The way I see it, the notification window should be adjusted according
to the current workload. If the link is idle, the window should be one
packet -- notify as soon as something is queued. As the workload
increases, the window increases to (safety_factor * allowable_latency /
packet_rate). The timer is set to allowable_latency to catch changes in
workload.
For example:
- allowable_latency 1ms (implies 1K vmexits/sec desired)
- current packet_rate 20K packets/sec
- safety_factor 0.8
So we request notifications every 0.8 * 20K * 1m = 16 packets, and set
the timer to 1ms. Usually we get a notification every 16 packets, just
before timer expiration. If the workload increases, we get
notifications sooner, so we increase the window. If the workload drops,
the timer fires and we decrease the window.
The timer should never fire on an all-out benchmark, or in a ping test.
Second is that currently when we fill up the ring we block the guest
vcpu and flush. Thus, while we're copying a entire ring full of packets
that guest isn't making progress. Doing the copying in the I/O thread
helps there.
We're hurting our cache, and this won't work well with many nics. At
the very least this should be done in a dedicated thread. It's also
going to damage latency.
The only real fix is to avoid the copy altogether.
Note - the only net I/O we currently do in the vcpu thread at the moment
is when the guest is saturating the link. Any other timer, all the I/O
is done in the I/O thread by virtue of the timer.
This is fundamental brokenness, as mentioned above, in my
non-networking-expert opinion.
Is the overhead of managing the timer too high, or does it fire too
late and so we sleep? If the latter, can we tune it dynamically?
For example, if the guest sees it is making a lot of progress without
the host catching up (waiting on the tx timer), it can
kick_I_really_mean_this_now(), to get the host to notice.
It does that already - if the ring fills up the guests forces a kick
which causes the host to flush the ring in the vcpu thread.
Should happen some time before the ring fills up. Especially if we make
the flushing aync by offloading to some other thread.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html