Marcelo Tosatti wrote:
I think comparison is not entirely fair. You're using
KVM_HC_VAPIC_POLL_IRQ ("null" hypercall) and the compiler optimizes that
(on Intel) to only one register read:

        nr = kvm_register_read(vcpu, VCPU_REGS_RAX);

Whereas in a real hypercall for (say) PIO you would need the address,
size, direction and data.

Well, that's probably one of the reasons pio is slower, as the cpu has to set these up, and the kernel has to read them.

Also for PIO/MMIO you're adding this unoptimized lookup to the measurement:

        pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
        if (pio_dev) {
                kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data);
complete_pio(vcpu); return 1;
        }

Since there are only one or two elements in the list, I don't see how it could be optimized.

Whereas for hypercall measurement you don't. I believe a fair comparison
would be have a shared guest/host memory area where you store guest/host
TSC values and then do, on guest:

        rdtscll(&shared_area->guest_tsc);
        pio/mmio/hypercall
        ... back to host
        rdtscll(&shared_area->host_tsc);

And then calculate the difference (minus guests TSC_OFFSET of course)?

I don't understand why you want host tsc? We're interested in round-trip latency, so you want guest tsc all the time.

--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to