On Mon, Mar 04, 2013 at 08:37:38PM +0100, Jan Kiszka wrote:
> On 2013-03-04 20:33, Gleb Natapov wrote:
> > On Mon, Mar 04, 2013 at 08:23:52PM +0100, Jan Kiszka wrote:
> >> On 2013-03-04 19:39, Gleb Natapov wrote:
> >>> On Mon, Mar 04, 2013 at 07:08:08PM +0100, Jan Kiszka wrote:
> >>>> On 2013-03-04 18:56, Gleb Natapov wrote:
> >>>>> On Mon, Mar 04, 2013 at 03:25:47PM +0100, Jan Kiszka wrote:
> >>>>>> On 2013-03-04 15:15, Gleb Natapov wrote:
> >>>>>>> On Mon, Mar 04, 2013 at 03:09:51PM +0100, Jan Kiszka wrote:
> >>>>>>>> On 2013-03-04 14:22, Gleb Natapov wrote:
> >>>>>>>>> On Thu, Feb 28, 2013 at 10:44:47AM +0100, Jan Kiszka wrote:
> >>>>>>>>>> The logic for calculating the value with which we call
> >>>>>>>>>> kvm_set_cr0/4 was
> >>>>>>>>>> broken (will definitely be visible with nested unrestricted guest
> >>>>>>>>>> mode
> >>>>>>>>>> support). Also, we performed the check regarding CR0_ALWAYSON too
> >>>>>>>>>> early
> >>>>>>>>>> when in guest mode.
> >>>>>>>>>>
> >>>>>>>>>> What really needs to be done on both CR0 and CR4 is to mask out
> >>>>>>>>>> L1-owned
> >>>>>>>>>> bits and merge them in from GUEST_CR0/4. In contrast, arch.cr0/4
> >>>>>>>>>> and
> >>>>>>>>>> arch.cr0/4_guest_owned_bits contain the mangled L0+L1 state and,
> >>>>>>>>>> thus,
> >>>>>>>>>> are not suited as input.
> >>>>>>>>>>
> >>>>>>>>>> For both CRs, we can then apply the check against
> >>>>>>>>>> VMXON_CRx_ALWAYSON and
> >>>>>>>>>> refuse the update if it fails. To be fully consistent, we
> >>>>>>>>>> implement this
> >>>>>>>>>> check now also for CR4.
> >>>>>>>>>>
> >>>>>>>>>> Finally, we have to set the shadow to the value L2 wanted to write
> >>>>>>>>>> originally.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Jan Kiszka <[email protected]>
> >>>>>>>>>> ---
> >>>>>>>>>>
> >>>>>>>>>> Found while making unrestricted guest mode working. Not sure what
> >>>>>>>>>> impact
> >>>>>>>>>> the bugs had on current feature level, if any.
> >>>>>>>>>>
> >>>>>>>>>> For interested folks, I've pushed my nEPT environment here:
> >>>>>>>>>>
> >>>>>>>>>> git://git.kiszka.org/linux-kvm.git nept-hacking
> >>>>>>>>>>
> >>>>>>>>>> arch/x86/kvm/vmx.c | 49
> >>>>>>>>>> ++++++++++++++++++++++++++++++-------------------
> >>>>>>>>>> 1 files changed, 30 insertions(+), 19 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>>>>>>>>> index 7cc566b..d1dac08 100644
> >>>>>>>>>> --- a/arch/x86/kvm/vmx.c
> >>>>>>>>>> +++ b/arch/x86/kvm/vmx.c
> >>>>>>>>>> @@ -4605,37 +4605,48 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu,
> >>>>>>>>>> unsigned char *hypercall)
> >>>>>>>>>> /* called to set cr0 as appropriate for a mov-to-cr0 exit. */
> >>>>>>>>>> static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long
> >>>>>>>>>> val)
> >>>>>>>>>> {
> >>>>>>>>>> - if (to_vmx(vcpu)->nested.vmxon &&
> >>>>>>>>>> - ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON))
> >>>>>>>>>> - return 1;
> >>>>>>>>>> -
> >>>>>>>>>> if (is_guest_mode(vcpu)) {
> >>>>>>>>>> - /*
> >>>>>>>>>> - * We get here when L2 changed cr0 in a way that did
> >>>>>>>>>> not change
> >>>>>>>>>> - * any of L1's shadowed bits (see
> >>>>>>>>>> nested_vmx_exit_handled_cr),
> >>>>>>>>>> - * but did change L0 shadowed bits. This can currently
> >>>>>>>>>> happen
> >>>>>>>>>> - * with the TS bit: L0 may want to leave TS on (for
> >>>>>>>>>> lazy fpu
> >>>>>>>>>> - * loading) while pretending to allow the guest to
> >>>>>>>>>> change it.
> >>>>>>>>>> - */
> >>>>>>>>> Can't say I understand this patch yet, but it looks like the
> >>>>>>>>> comment is
> >>>>>>>>> still valid. Why have you removed it?
> >>>>>>>>
> >>>>>>>> L0 allows L1 or L2 at most to own TS, the rest is host-owned. I think
> >>>>>>>> the comment was always misleading.
> >>>>>>>>
> >>>>>>> I do not see how it is misleading. For everything but TS we will not
> >>>>>>> get
> >>>>>>> here (if L1 is kvm). For TS we will get here if L1 allows L2 to change
> >>>>>>> it, but L0 does not.
> >>>>>>
> >>>>>> For everything *but guest-owned* we will get here, thus for most CR0
> >>>>>> accesses (bit-wise, not regarding frequency).
> >>>>>>
> >>>>> I do not see how. If bit is trapped by L1 we will not get here. We will
> >>>>> do vmexit to L1 instead. nested_vmx_exit_handled_cr() check this
> >>>>> condition.
> >>>>> I am not arguing about you code (didn't grok it yet), but the comment
> >>>>> still make sense to me.
> >>>>
> >>>> "We get here when L2 changed cr0 in a way that did not change any of
> >>>> L1's shadowed bits (see nested_vmx_exit_handled_cr), but did change L0
> >>>> shadowed bits." That I can sign. But the rest about TS is just
> >>>> misleading as we trap _every_ change in L0 - except for TS under certain
> >>>> conditions. The old code was tested against TS only, that's what the
> >>>> comment witness.
> >>>>
> >>> TS is just an example of how we can get here with KVM on KVM. Obviously
> >>> other hypervisors may have different configuration. L2 may allow full
> >>> guest access to CR0 and then each CR0 write by L2 will be handled here.
> >>> Under what other condition "we trap _every_ change in L0 - except for
> >>> TS" here?
> >>
> >> On FPU activation:
> >>
> >> cr0_guest_owned_bits = X86_CR0_TS;
> >>
> >> And on FPU deactivation:
> >>
> >> cr0_guest_owned_bits = 0;
> >>
> > That's exactly TS case that comment explains. Note that
> > CR0_GUEST_HOST_MASK = ~cr0_guest_owned_bits.
>
> Again, it's the inverse of what the comment suggest: we enter
> handle_set_cr0 for every change on CR0 that doesn't match the shadow -
> except TS was given to the guest by both L1 and L0 (or TS isn't changed
> as well).
That doesn't make sense to me. I do not even sure what you are saying
since you do not specify what shadow is matched. From the code I see
that on CR0 exit to L0 from L2 we check if L2 tries to change CR0 bits
that L1 claims to belong to it and do #vmexit to L1 if it is:
if (vmcs12->cr0_guest_host_mask & (val ^ vmcs12->cr0_read_shadow))
return 1;
We never reach handle_set_cr0() in that case.
Can you provide an example with actual values for L2/L1/L0 of what you
are trying to say?
>
> >
> >>>
> >>>> If you prefer, I'll leave part one in.
> >>>>
> >>> Please do so. Without the comment it is not obvious why exit condition
> >>> is not checked here. Still do not see why you object to TS part.
> >>
> >> It describes a corner case in a way that suggests this is the only
> >> reason why we get here.
> >>
> > For KVM on KVM it is.
>
> Which is, sorry, irrelevant.
>
As an example that helps developers to understand the code it pretty much
is. I agree that "This can currently happen..." should be replaced with
something like "With KVM as L1 this can currently happen...".
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html