Gregory Haskins wrote: >> >>> >>> vcpu- >cpu = - 1; >>> vcpu- >kvm = kvm; >>> @@ - 366,13 +370,20 @@ static void free_pio_guest_pages(struct kvm_vcpu >>> *vcpu) >>> >>> static void kvm_free_vcpu(struct kvm_vcpu *vcpu) >>> { >>> + unsigned long irqsave; >>> + >>> if (!vcpu- >vmcs) >>> return; >>> >>> vcpu_load(vcpu); >>> kvm_mmu_destroy(vcpu); >>> vcpu_put(vcpu); >>> + >>> + spin_lock_irqsave(&vcpu- >irq.lock, irqsave); >>> + vcpu- >irq.task = NULL; >>> + spin_unlock_irqrestore(&vcpu- >irq.lock, irqsave); >>> >>> >> Can irq.task be non- NULL here at all? Also, we only free vcpus when we >> destroy the vm, and paravirt drivers would hopefully hold a ref to the >> vm, so there's nobody to race against here. >> > > I am perhaps being a bit overzealous here. What I found in practice is that > the LVTT can screw things up on shutdown, so I was being pretty conservative > on the synchronization here. > >
That may point out to a different sync problem. All pending timers ought to have been canceled before we reach here. Please check to make sure this isn't papering over another problem. >>> kvm_irqdevice_destructor(&vcpu- >irq.dev); >>> >>> @@ - 1868,6 +1880,10 @@ static int kvm_vcpu_ioctl_run(struct kvm_vcpu >>> *vcpu, >>> >> struct kvm_run *kvm_run) >> >>> kvm_arch_ops- >decache_regs(vcpu); >>> } >>> >>> + spin_lock_irqsave(&vcpu- >irq.lock, irqsaved); >>> + vcpu- >irq.task = current; >>> + spin_unlock_irqrestore(&vcpu- >irq.lock, irqsaved); >>> + >>> >>> >> Just assignment + __smp_wmb(). >> > > (This comment applies to all of the subsequent reviews where memory barriers > are recommended instead of locks:) > > I cant quite wrap my head around whether all these critical sections are > correct with just a barrier instead of a full-blown lock. I would prefer to > be conservative and leave them as locks for now. Someone with better insight > could make a second pass and optimize the locks into barriers where > appropriate. I am just uncomfortable doing it feeling confident that I am > not causing races. If you insist on making the changes before the code is > accepted, ok. Just note that I am not comfortable ;) > > I approach it from the other direction: to me, a locked assignment says that something is fundamentally wrong. Usually anything under a lock is a read-modify-write operation, otherwise the writes just stomp on each other. This is the source of the all the race-after-vmexit-irq-check comments you've been getting to me. No matter how many times you explain it, every time I see it the automatic race alarm pops up. >>> +/* >>> * This function will be invoked whenever the vcpu- >irq.dev raises its >>> INTR >>> * line >>> */ >>> @@ - 2318,10 +2348,52 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this, >>> { >>> struct kvm_vcpu *vcpu = (struct kvm_vcpu*)this- >private; >>> unsigned long flags; >>> + int direct_ipi = - 1; >>> >>> spin_lock_irqsave(&vcpu- >irq.lock, flags); >>> - __set_bit(pin, &vcpu- >irq.pending); >>> + >>> + if (!test_bit(pin, &vcpu- >irq.pending)) { >>> + /* >>> + * Record the change.. >>> + */ >>> + __set_bit(pin, &vcpu- >irq.pending); >>> + >>> + /* >>> + * then wake up the vcpu (if necessary) >>> + */ >>> + if (vcpu- >irq.task && (vcpu- >irq.task != current)) { >>> + if (vcpu- >irq.guest_mode) { >>> + /* >>> + * If we are in guest mode, we can optimize >>> + * the IPI by executing a function directly >>> + * on the owning processor. >>> + */ >>> + direct_ipi = task_cpu(vcpu- >irq.task); >>> + BUG_ON(direct_ipi == smp_processor_id()); >>> + } else >>> + /* >>> + * otherwise, we must assume that we could be >>> + * blocked anywhere, including userspace. Send >>> + * a signal to give everyone a chance to get >>> + * notification >>> + */ >>> + send_sig(vcpu- >irq.signo, vcpu- >irq.task, 0); >>> + } >>> + } >>> + >>> spin_unlock_irqrestore(&vcpu- >irq.lock, flags); >>> + >>> + if (direct_ipi != - 1) { >>> + /* >>> + * Not sure if disabling preemption is needed. >>> + * The kick_process() code does this so I copied it >>> + */ >>> + preempt_disable(); >>> [preemption is disabled here anyway] >>> + smp_call_function_single(direct_ipi, >>> + kvm_vcpu_guest_intr, >>> + vcpu, 0, 0); >>> + preempt_enable(); >>> + } >>> >>> >> I see why you must issue the IPI outside the spin_lock_irqsave(), but >> aren't you now opening a race? vcpu enters guest mode, irq on other >> cpu, irq sets direct_ipi to wakeup guest, releases lock, vcpu exits to >> userspace (or migrates to another cpu), ipi is issued but nobody cares. >> > > Its subtle, but I think its ok. The race is actually against the setting of > the irq.pending. This *must* happen inside the lock or the guest could exit > and miss the interrupt. Once the pending bit is set, however, the guest can > be woken up in any old fashion and the behavior should be correct. If the > guest has already exited before the IPI is issued, its effectively a no-op > (well, really its just a wasted IPI/reschedule event, but no harm is done). > Does this make sense? Did I miss something else? > No, you are correct wrt the vcpu migrating to another cpu. What about vs. exit to userspace where we may sleep? -- error compiling committee.c: too many arguments to function ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel