On 2013-02-17 16:07, Gleb Natapov wrote:
> On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <[email protected]>
>>
>> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped
>> vmx_complete_interrupts on L2 exits. This is required because, with
>> direct interrupt injection from L0 to L2, L0 has to update its pending
>> events.
>>
>> Also, we need to allow vmx_cancel_injection when entering L2 in we left
>> to L0. This condition is indirectly derived from the absence of valid
>> vectoring info in vmcs12. We no explicitly clear it if we find out that
>> the L2 exit is not targeting L1 but L0.
>>
> We really need to overhaul how interrupt injection is emulated in nested
> VMX. Why not put pending events into event queue instead of
> get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way.

I was thinking about the same step but felt unsure so far if
vmx_complete_interrupts & Co. do not include any assumptions about the
vmcs configuration that won't match what L1 does. So I went for a
different path first, specifically to avoid impact on these hairy bits
for non-nested mode.

> 
>> Signed-off-by: Jan Kiszka <[email protected]>
>> ---
>>  arch/x86/kvm/vmx.c |   43 +++++++++++++++++++++++++++----------------
>>  1 files changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 68a045ae..464b6a5 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -624,6 +624,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
>>                          struct kvm_segment *var, int seg);
>>  static bool guest_state_valid(struct kvm_vcpu *vcpu);
>>  static u32 vmx_segment_access_rights(struct kvm_segment *var);
>> +static void vmx_complete_interrupts(struct vcpu_vmx *vmx);
>>  
>>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>> @@ -6213,9 +6214,19 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>>      else
>>              vmx->nested.nested_run_pending = 0;
>>  
>> -    if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) {
>> -            nested_vmx_vmexit(vcpu);
>> -            return 1;
>> +    if (is_guest_mode(vcpu)) {
>> +            if (nested_vmx_exit_handled(vcpu)) {
>> +                    nested_vmx_vmexit(vcpu);
>> +                    return 1;
>> +            }
>> +            /*
>> +             * Now it's clear, we are leaving to L0. Perform the postponed
>> +             * interrupt completion and clear L1's vectoring info field so
>> +             * that we do not overwrite what L0 wants to inject on
>> +             * re-entry.
>> +             */
>> +            vmx_complete_interrupts(vmx);
>> +            get_vmcs12(vcpu)->idt_vectoring_info_field = 0;
>>      }
>>  
>>      if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
>> @@ -6495,8 +6506,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx 
>> *vmx,
>>  
>>  static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>>  {
>> -    if (is_guest_mode(&vmx->vcpu))
>> -            return;
>>      __vmx_complete_interrupts(vmx, vmx->idt_vectoring_info,
>>                                VM_EXIT_INSTRUCTION_LEN,
>>                                IDT_VECTORING_ERROR_CODE);
>> @@ -6504,7 +6513,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx 
>> *vmx)
>>  
>>  static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>>  {
>> -    if (is_guest_mode(vcpu))
>> +    if (is_guest_mode(vcpu) &&
>> +        get_vmcs12(vcpu)->idt_vectoring_info_field &
>> +                    VECTORING_INFO_VALID_MASK)
> Why skip cancel_injection at all? As far as I see we can lose injected
> irq if we do. Consider:
> 
>   io thread                                  vcpu in nested mode
> set irr 200
>                                           clear irr 200 set isr 200
>                                           set 200 in VM_ENTRY_INTR_INFO_FIELD
> set irr 250
> set KVM_REQ_EVENT
>                                           if (KVM_REQ_EVENT)
>                                                   vmx_cancel_injection() <- 
> does nothing

No, it does cancel as vmcs12's idt_vectoring_info_field signals an
invalid state then. Only if we left L2 with valid vectoring info and are
about to reenter, we skip the cancellation - but in that case we didn't
inject anything from L0 previously anyway.

Jan

> 
>                                           clear irr 250 set isr 250
>                                           set 250 in VM_ENTRY_INTR_INFO_FIELD
>                                           vmentry
> 
> So now APIC state is bogus. isr bit 200 is set but vector 200 was never
> injected and actually is lost forever. Next EOI will clear isr 250 and
> isr 200 will block all lower level interrupt forever.
> 
> --
>                       Gleb.
> 


Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to