On 02.10.2010, at 03:56, Alexander Graf wrote:

> 
> Am 01.10.2010 um 21:22 schrieb Zachary Amsden <[email protected]>:
> 
>> On 10/01/2010 04:46 AM, Alexander Graf wrote:
>>> On 01.10.2010, at 13:21, Nadav Har'El wrote:
>>> 
>>> 
>>>> On Thu, Sep 30, 2010, Zachary Amsden wrote about "Re: TSC in nested SVM 
>>>> and VMX":
>>>> 
>>>>> 1)  When reading an MSR, we are not emulating the L2 guest; we are
>>>>> DIRECTLY reading the MSR for the L1 emulation.  Any emulation of the L2
>>>>> guest is actually done by the code running /inside/ the L1 emulation, so
>>>>> MSR reads for the L2 guest are handed by L1, and MSR reads for the L1
>>>>> guest are handled by L0, which is this code.
>>>>> ...
>>>>> So if we are currently running nested, the L1 tsc_offset is stored in
>>>>> the nested.hsave field; the vmcb which is active is polluted by the L2
>>>>> guest offset, which would be incorrect to return to the L1 emulation.
>>>>> 
>>>> Thanks for the detailed explanation.
>>>> 
>>>> It seems, then, that the nested VMX logic is somewhat different from that
>>>> of the nested SVM. In nested VMX, if a function gets called when running
>>>> L1, the current VMCS will be that of L1 (aka vmcs01), not of its guest L2
>>>> (and I'm not even sure *which* L2 that would be when there are multiple
>>>> L2 guests for the one L1).
>>>> 
>>> If the #vmexit comes while you're in L1, everything works on the L1's vmcb. 
>>> If you hit it while in L2, everything works on the L2's vmcb unless special 
>>> attention is taken.
>>> 
>>> The reason behind the TSC shift is very simple. With the tsc_offset setting 
>>> we're trying to adjust the L1's offset. Adjusting the L1's offset means we 
>>> need to adjust L1 and L2 alike, as the virtual L2's offset == L1 offset + 
>>> vmcb L2 offset, because L2's TSC is also offset by the amount L1 is.
>>> 
>>> So basically what happens is:
>>> 
>>> nested VMRUN:
>>> 
>>>        svm->vmcb->control.tsc_offset += nested_vmcb->control.tsc_offset;
>>> 
>>> please note the +=!
>>> 
>>> 
>>> svm_write_tsc_offset:
>>> 
>>> This gets called when we really want to current level's TSC offset only 
>>> because the guest issued a tsc write. In L2 this means the L2's value.
>>> 
>>>        if (is_nested(svm)) {
>>>                g_tsc_offset = svm->vmcb->control.tsc_offset -
>>>                               svm->nested.hsave->control.tsc_offset;
>>> 
>>> Remember the difference between L1 and L2.
>>> 
>>>                svm->nested.hsave->control.tsc_offset = offset;
>>> 
>>> Set L1 to the new offset
>>> 
>>>        }
>>> 
>>>        svm->vmcb->control.tsc_offset = offset + g_tsc_offset;
>>> 
>>> Set L2 to new offset + delta.
>>> 
>>> 
>>> So what this function does is that it treats TSC writes as L1 writes even 
>>> while in L2 and adjusts L2 accordingly. Joerg, this sounds fishy to me. Are 
>>> you sure this is intended and works when L1 doesn't intercept MSR writes to 
>>> TSC?
>>> 
>> 
>> L1 must intercept MSR writes to TSC for this to work.  It does, so all is 
>> well.
> 
> Sure, in nested kvm all is fine because we becer

never

> hit the above code path. But other nypervisors

hypervisors

> might not intercept tsc writes which should only be reflected in an l2 tsc 
> offset change, no?

Note to self: proof-read mails when writing from a phone.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to