Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Avi Kivity Tue, 08 May 2007 01:13:18 -0700

Gregory Haskins wrote:
>>     
>>>  
>>>             vcpu- >cpu = - 1;
>>>             vcpu- >kvm = kvm;
>>> @@ - 366,13 +370,20 @@ static void free_pio_guest_pages(struct kvm_vcpu 
>>> *vcpu)
>>>  
>>>  static void kvm_free_vcpu(struct kvm_vcpu *vcpu)
>>>  {
>>> +   unsigned long irqsave;
>>> +
>>>     if (!vcpu- >vmcs)
>>>             return;
>>>  
>>>     vcpu_load(vcpu);
>>>     kvm_mmu_destroy(vcpu);
>>>     vcpu_put(vcpu);
>>> +
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, irqsave);
>>> +   vcpu- >irq.task = NULL;
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, irqsave);
>>>   
>>>       
>> Can irq.task be non- NULL here at all?  Also, we only free vcpus when we 
>> destroy the vm, and paravirt drivers would hopefully hold a ref to the 
>> vm, so there's nobody to race against here.
>>     
>
> I am perhaps being a bit overzealous here.  What I found in practice is that 
> the LVTT can screw things up on shutdown, so I was being pretty conservative 
> on the synchronization here.  
>
>


That may point out to a different sync problem.  All pending timers 
ought to have been canceled before we reach here.  Please check to make 
sure this isn't papering over another problem.

>>>     kvm_irqdevice_destructor(&vcpu- >irq.dev);
>>>  
>>> @@ - 1868,6 +1880,10 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu 
>>> *vcpu, 
>>>       
>> struct kvm_run *kvm_run)
>>     
>>>             kvm_arch_ops- >decache_regs(vcpu);
>>>     }
>>>  
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, irqsaved);
>>> +   vcpu- >irq.task = current;
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, irqsaved);
>>> +
>>>   
>>>       
>> Just assignment + __smp_wmb().
>>     
>
> (This comment applies to all of the subsequent reviews where memory barriers 
> are recommended instead of locks:)
>
> I cant quite wrap my head around whether all these critical sections are 
> correct with just a barrier instead of a full-blown lock.  I would prefer to 
> be conservative and leave them as locks for now.  Someone with better insight 
> could make a second pass and optimize the locks into barriers where 
> appropriate.  I am just uncomfortable doing it feeling confident that I am 
> not causing races.  If you insist on making the changes before the code is 
> accepted, ok.  Just note that I am not comfortable ;)
>
>   

I approach it from the other direction: to me, a locked assignment says 
that something is fundamentally wrong.  Usually anything under a lock is 
a read-modify-write operation, otherwise the writes just stomp on each 
other.

This is the source of the all the race-after-vmexit-irq-check comments 
you've been getting to me.  No matter how many times you explain it, 
every time I see it the automatic race alarm pops up.

>>> +/*
>>>   * This function will be invoked whenever the vcpu- >irq.dev raises its 
>>> INTR
>>>   * line
>>>   */
>>> @@ - 2318,10 +2348,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
>>>  {
>>>     struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private;
>>>     unsigned long flags;
>>> +   int direct_ipi = - 1;
>>>  
>>>     spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> -   __set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +   if (!test_bit(pin, &vcpu- >irq.pending)) {
>>> +           /*
>>> +            * Record the change..
>>> +            */
>>> +           __set_bit(pin, &vcpu- >irq.pending);
>>> +
>>> +           /*
>>> +            * then wake up the vcpu (if necessary)
>>> +            */
>>> +           if (vcpu- >irq.task && (vcpu- >irq.task != current)) {
>>> +                   if (vcpu- >irq.guest_mode) {
>>> +                           /*
>>> +                            * If we are in guest mode, we can optimize
>>> +                            * the IPI by executing a function directly
>>> +                            * on the owning processor.
>>> +                            */
>>> +                           direct_ipi = task_cpu(vcpu- >irq.task);
>>> +                           BUG_ON(direct_ipi == smp_processor_id());
>>> +                   } else
>>> +                           /*
>>> +                            * otherwise, we must assume that we could be
>>> +                            * blocked anywhere, including userspace. Send
>>> +                            * a signal to give everyone a chance to get
>>> +                            * notification
>>> +                            */
>>> +                           send_sig(vcpu- >irq.signo, vcpu- >irq.task, 0);
>>> +           }
>>> +   }
>>> +
>>>     spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>> +
>>> +   if (direct_ipi != - 1) {
>>> +           /*
>>> +            * Not sure if disabling preemption is needed.
>>> +            * The kick_process() code does this so I copied it
>>> +            */
>>> +           preempt_disable();
>>>       

[preemption is disabled here anyway]

>>> +           smp_call_function_single(direct_ipi,
>>> +                                    kvm_vcpu_guest_intr,
>>> +                                    vcpu, 0, 0);
>>> +           preempt_enable();
>>> +   }
>>>   
>>>       
>> I see why you must issue the IPI outside the spin_lock_irqsave(), but 
>> aren't you now opening a race?  vcpu enters guest mode, irq on other 
>> cpu, irq sets direct_ipi to wakeup guest, releases lock, vcpu exits to 
>> userspace (or migrates to another cpu), ipi is issued but nobody cares.
>>     
>
> Its subtle, but I think its ok.  The race is actually against the setting of 
> the irq.pending.  This *must* happen inside the lock or the guest could exit 
> and miss the interrupt.  Once the pending bit is set, however, the guest can 
> be woken up in any old fashion and the behavior should be correct.  If the 
> guest has already exited before the IPI is issued, its effectively a no-op 
> (well, really its just a wasted IPI/reschedule event,  but no harm is done).  
> Does this make sense?  Did I miss something else?
>   

No, you are correct wrt the vcpu migrating to another cpu.

What about vs. exit to userspace where we may sleep?

-- 
error compiling committee.c: too many arguments to function


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Re: [kvm-devel] [PATCH 3/4] KVM: Adds ability to preempt an executing VCPU

Reply via email to