Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Gregory Haskins Mon, 07 May 2007 07:53:54 -0700

>>> On Mon, May 7, 2007 at  5:57 AM, in message <[EMAIL PROTECTED]>,
Avi Kivity <[EMAIL PROTECTED]> wrote: 
> Gregory Haskins wrote:
>> The VCPU executes synchronously w.r.t. userspace today, and therefore
>> interrupt injection is pretty straight forward.  However, we will soon need
>> to be able to inject interrupts asynchronous to the execution of the VCPU
>> due to the introduction of SMP, paravirtualized drivers, and asynchronous
>> hypercalls.  This patch adds support to the interrupt mechanism to force
>> a VCPU to VMEXIT when a new interrupt is pending.
>>
>> Signed- off- by: Gregory Haskins <[EMAIL PROTECTED]>
>> ---
>>
>>  drivers/kvm/kvm.h      |    5 +++
>>  drivers/kvm/kvm_main.c |   74 
> +++++++++++++++++++++++++++++++++++++++++++++++-
>>  drivers/kvm/svm.c      |   43 ++++++++++++++++++++++++++++
>>  drivers/kvm/vmx.c      |   43 ++++++++++++++++++++++++++++
>>  4 files changed, 164 insertions(+), 1 deletions(- )
>>
>> diff -- git a/drivers/kvm/kvm.h b/drivers/kvm/kvm.h
>> index d41d653..15c8bec 100644
>> ---  a/drivers/kvm/kvm.h
>> +++ b/drivers/kvm/kvm.h
>> @@ - 321,6 +321,8 @@ void kvm_io_bus_register_dev(struct kvm_io_bus *bus,
>>  
>>  #define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
>>  
>> +#define KVM_SIGNAL_VIRTUAL_INTERRUPT 33 /* Hardcoded for now */
>> +
>>  /*
>>   * structure for maintaining info for interrupting an executing VCPU
>>   */
>> @@ - 329,6 +331,9 @@ struct kvm_vcpu_irq {
>>      struct kvm_irqdevice dev;
>>      int                  pending;
>>      int                  deferred;
>> +    struct task_struct  *task;
>> +    int                  signo;
>> +    int                  guest_mode;
>>  };
>>  
>>  struct kvm_vcpu {
>> diff -- git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
>> index 9aeb2f7..6acbd9b 100644
>> ---  a/drivers/kvm/kvm_main.c
>> +++ b/drivers/kvm/kvm_main.c
>> @@ - 304,6 +304,10 @@ static struct kvm *kvm_create_vm(void)
>>              memset(&vcpu- >irq, 0, sizeof(vcpu- >irq));
>>              spin_lock_init(&vcpu- >irq.lock);
>>              vcpu- >irq.deferred = - 1;
>> +            /*
>> +             * This should be settable by userspace someday
>> +             */
>> +            vcpu- >irq.signo = KVM_SIGNAL_VIRTUAL_INTERRUPT;
>>   
> 
> This needs to be fixed prior to merging.  Hopefully not by setting the 
> signal number, bit by making the vcpu fd writable (userspace can attach 
> a signal to the fd if it wishes).
> 
>>  
>>              vcpu- >cpu = - 1;
>>              vcpu- >kvm = kvm;
>> @@ - 366,13 +370,20 @@ static void free_pio_guest_pages(struct kvm_vcpu 
>> *vcpu)
>>  
>>  static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
>>  {
>> +    unsigned long irqsave;
>> +
>>      if (!vcpu- >vmcs)
>>              return;
>>  
>>      vcpu_load(vcpu);
>>      kvm_mmu_destroy(vcpu);
>>      vcpu_put(vcpu);
>> +
>> +    spin_lock_irqsave(&vcpu- >irq.lock, irqsave);
>> +    vcpu- >irq.task = NULL;
>> +    spin_unlock_irqrestore(&vcpu- >irq.lock, irqsave);
>>   
> 
> Can irq.task be non- NULL here at all?  Also, we only free vcpus when we 
> destroy the vm, and paravirt drivers would hopefully hold a ref to the 
> vm, so there's nobody to race against here.


I am perhaps being a bit overzealous here.  What I found in practice is that 
the LVTT can screw things up on shutdown, so I was being pretty conservative on 
the synchronization here.  

> 
>>      kvm_irqdevice_destructor(&vcpu- >irq.dev);
>>  
>> @@ - 1868,6 +1880,10 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
>>              kvm_arch_ops- >decache_regs(vcpu);
>>      }
>>  
>> +    spin_lock_irqsave(&vcpu- >irq.lock, irqsaved);
>> +    vcpu- >irq.task = current;
>> +    spin_unlock_irqrestore(&vcpu- >irq.lock, irqsaved);
>> +
>>   
> 
> Just assignment + __smp_wmb().

(This comment applies to all of the subsequent reviews where memory barriers 
are recommended instead of locks:)

I cant quite wrap my head around whether all these critical sections are 
correct with just a barrier instead of a full-blown lock.  I would prefer to be 
conservative and leave them as locks for now.  Someone with better insight 
could make a second pass and optimize the locks into barriers where 
appropriate.  I am just uncomfortable doing it feeling confident that I am not 
causing races.  If you insist on making the changes before the code is 
accepted, ok.  Just note that I am not comfortable ;)

> 
>> +/*
>>   * This function will be invoked whenever the vcpu- >irq.dev raises its INTR
>>   * line
>>   */
>> @@ - 2318,10 +2348,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>>  {
>>      struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private;
>>      unsigned long flags;
>> +    int direct_ipi = - 1;
>>  
>>      spin_lock_irqsave(&vcpu- >irq.lock, flags);
>> -    __set_bit(pin, &vcpu- >irq.pending);
>> +
>> +    if (!test_bit(pin, &vcpu- >irq.pending)) {
>> +            /*
>> +             * Record the change..
>> +             */
>> +            __set_bit(pin, &vcpu- >irq.pending);
>> +
>> +            /*
>> +             * then wake up the vcpu (if necessary)
>> +             */
>> +            if (vcpu- >irq.task && (vcpu- >irq.task != current)) {
>> +                    if (vcpu- >irq.guest_mode) {
>> +                            /*
>> +                             * If we are in guest mode, we can optimize
>> +                             * the IPI by executing a function directly
>> +                             * on the owning processor.
>> +                             */
>> +                            direct_ipi = task_cpu(vcpu- >irq.task);
>> +                            BUG_ON(direct_ipi == smp_processor_id());
>> +                    } else
>> +                            /*
>> +                             * otherwise, we must assume that we could be
>> +                             * blocked anywhere, including userspace. Send
>> +                             * a signal to give everyone a chance to get
>> +                             * notification
>> +                             */
>> +                            send_sig(vcpu- >irq.signo, vcpu- >irq.task, 0);
>> +            }
>> +    }
>> +
>>      spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>> +
>> +    if (direct_ipi != - 1) {
>> +            /*
>> +             * Not sure if disabling preemption is needed.
>> +             * The kick_process() code does this so I copied it
>> +             */
>> +            preempt_disable();
>> +            smp_call_function_single(direct_ipi,
>> +                                     kvm_vcpu_guest_intr,
>> +                                     vcpu, 0, 0);
>> +            preempt_enable();
>> +    }
>>   
> 
> I see why you must issue the IPI outside the spin_lock_irqsave(), but 
> aren't you now opening a race?  vcpu enters guest mode, irq on other 
> cpu, irq sets direct_ipi to wakeup guest, releases lock, vcpu exits to 
> userspace (or migrates to another cpu), ipi is issued but nobody cares.

Its subtle, but I think its ok.  The race is actually against the setting of 
the irq.pending.  This *must* happen inside the lock or the guest could exit 
and miss the interrupt.  Once the pending bit is set, however, the guest can be 
woken up in any old fashion and the behavior should be correct.  If the guest 
has already exited before the IPI is issued, its effectively a no-op (well, 
really its just a wasted IPI/reschedule event,  but no harm is done).  Does 
this make sense?  Did I miss something else?



-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Reply via email to