Mark McLoughlin wrote:
By removing the tx timer altogether and doing all the copies in the I/O thread, we can keep the I/O churning away in parallel with the guest generating more I/O.
On a multi-socket machine, you may also be doing the copy on the wrong cache. We're also now increasing latency, and serializing all NICs on one thread.
Of course, the only true answer is to avoid the copy completely, but we aren't there yet. I'm not sure how to tradeoff the benefits below with the problems above.
In my tests, this significantly increases guest->host throughput, causes a minor increase in host->guest throughput, reduces CPU utilization somewhat and greatly reduces roundtrip times. Even aside from the benchmark results, removing the arbitrary 150us timer is a nicer option than coming up with a heuristic to make it vary according to load. Finally, on kernels which don't have a suitably low posix timer latency, we won't be scuppered by effectively having e.g. a 1ms timer. Note, this highlights that the I/O thread may become a scalability concern and we might want to consider e.g. an I/O thread per device. Note also that when tuning for a specific workload, which CPU the I/O thread is pinned to is important.
This is a significant drawback. Maybe we need a thread per virtio nic to do the copying, and affine it to the current cpu before handing off? but no, this will either serialize the vcpu with the copythread, or will force the vcpu to migrate.
-- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
