On Thu, Apr 28, 2011 at 08:00:57PM -0700, Zachary Amsden wrote:
> On 04/28/2011 01:20 PM, Joerg Roedel wrote:
>> This code checks how many guest tsc cycles have passed since this vCPU
>> was de-scheduled last time (and before it is running again). So since
>> the vCPU hasn't run in the meantime it had no chance to change its TSC.
>> Further, the other parameters like the TSC offset and the scaling
>> multiplier havn't changed too, so the only variable in the guest-tsc
>> calculation is the host-tsc.
>> So this calculation using the guest-tsc can detect backwards going
>> host-tsc as good as the old one. The benefit here is that we can feed
>> consistent values into adjust_tsc_offset().
>>
>
> While true, this is more complex than the original code. The original
> code here doesn't try to actually compensate for the guest TSC
> difference - instead what it does is NULL any discovered host TSC delta:
Why should KVM care about the host-tsc going backwards when the
guest-tsc moves forward? I think the bottom-line of this code is to make
sure the guest-tsc does not jump around (or even jumps backwards).
I also don't agree that this is additional complexity. With these
changes we were able to get rid of the last_host_tsc variable which is
actually a simplification.
As I see it, there are two situations here:
1) On hosts without tsc-scaling the value of tsc_delta calculated from
the guest-tsc values is the same as when calculated with host-tsc
values, because the guest-tsc only differs by an offset from the
host-tsc.
2) On hosts with tsc-scaling these two tsc_delta values may be
different. If the guest has a different tsc-freq as the host, we
can't simply adjust the tsc by an offset calculated from the host-tsc
values. So tsc_delta has to be calculated using the guest-tsc.
>
> if (tsc_delta < 0)
> mark_tsc_unstable("KVM discovered backwards TSC");
> if (check_tsc_unstable()) {
> kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
> vcpu->arch.tsc_catchup = 1;
> }
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>
> Erasing that delta also erases elapsed time since the VCPU has last been
> run, which isn't desirable, so it then sets tsc_catchup mode, which will
> restore the proper TSC. The request here triggers code which later
> updates the TSC offset again.
I just looked again at the tsc_catchup thing. Since the clock-update is
forced in the code above, and the clock-update-code adjusts the tsc
itself if necessary, is it really necessary to do this here at all?
> I left compute_guest_tsc in place to recompute time in guest units here,
> even if the underlying hardware rate changes.
>
> /*
> * We may have to catch up the TSC to match elapsed wall clock
> * time for two reasons, even if kvmclock is used.
> * 1) CPU could have been running below the maximum TSC rate
> * 2) Broken TSC compensation resets the base at each VCPU
> * entry to avoid unknown leaps of TSC even when running
> * again on the same CPU. This may cause apparent elapsed
> * time to disappear, and the guest to stand still or run
> * very slowly.
> */
> if (vcpu->tsc_catchup) {
> u64 tsc = compute_guest_tsc(v, kernel_ns);
> if (tsc > tsc_timestamp) {
> kvm_x86_ops->adjust_tsc_offset(v, tsc -
> tsc_timestamp);
> tsc_timestamp = tsc;
> }
> }
>
> So yeah, the code is getting pretty complex but we'd like to avoid that
> as much as possible - so I would prefer to have the hardware backwards
> compensation separate from the guest rate computation by doing this:
>
> step 1) remove any backwards hardware TSC delta (in hardware units)
> step 2) recompute guest TSC from a stable clock (gotten from kernel_ns)
> and apply adjustment (in guest units)
>
> So it appears you can just share most of the logic of guest TSC catchup
> mode.
>
> What do you think?
With the compensation done in kvm_guest_time_update I see no reason
anymore to adjust the tsc in vcpu_load at all. So if we can remove the
adjustment from there we can switch back to host-tsc there.
The question still is whether this needs to be done in KVM. For older
kernels Jan will take patches that handle this in kvm-kmod. So this code
can probably be removed from vcpu_load. The benefit is that we don't
need to re-introduce the last_host_tsc.
If we still need to call adjust_tsc_offset() in vcpu_load then we have
to do the calculation with the guest_tsc because this gives us a
tsc_delta in units that adjust_tsc_offset expects.
Regards,
Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html