Alex Williamson wrote:
This is an attempt to improve the latency of virtio-net while not hurting
throughput. I wanted to try moving packet TX into a different thread
so we can quickly return to the guest after it kicks us to send packets
out. I also switched the order of when the tx_timer comes into play, so
we can get an inital burst of packets out, then wait for the timer to
fire and notify us if there's more to do. Here's what it does for me
(average of 5 runs each, testing to a remote system on a 1Gb network):
netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s = 99.58%
netperf TCP_RR: 2028.72/s -> 3927.99/s = 193.62%
tbench: 92.99MB/s -> 99.97MB/s = 107.51%
I'd be interested to hear if it helps or hurts anyone else. Thanks,
My worry with this change is that increases cpu utilization even more
than it increases bandwidth, so that our bits/cycle measure decreases.
The descriptors (and perhaps data) are likely on the same cache as the
vcpu, and moving the transmit to the iothread will cause them to move to
the iothread's cache.
My preferred approach to increasing both bandwidth and bits/cycle (the
latter figure is more important IMO, unfortunately benchmarks don't
measure it) is to aio-enable tap and raw sockets. The vcpu thread would
only touch the packet descriptors (not data) and submit all packets in
one io_submit() call. Unfortunately a huge amount of work is needed to
pull this off.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html