On Tue, 2007-06-19 at 17:48 +0300, Avi Kivity wrote:
> Gregory Haskins wrote:
> > Halting in userspace requires a relatively cumbersome mechanism to signal 
> > the
> > halted VCPU.  Implementing halt in kernel should be relatively straight
> > forward and it eliminates the need for the signaling
> >
> >   
> 
> Merging this one in, found some nits:


I believe you will find that this implementation works correctly and
possibly even optimally for the situation.  I crafted it after carefully
balancing the design requirements with observed behavior through many
rounds of testing.  I do have a bit of experience in this area of linux,
though I am not an out-and-out expert.  Therefore its possible there are
other/better ways to do this. Based on this, I will just detail why I
did things this way and you guys can poke holes in it ;)

> 
> > +/*
> >   * This function is invoked whenever we want to interrupt a vcpu that is
> >   * currently executing in guest-mode.  It currently is a no-op because
> >   * the simple delivery of the IPI to execute this function accomplishes our
> > @@ -2481,6 +2556,16 @@ static void kvm_vcpu_intr(struct kvm_irqsink *this,
> >                     BUG_ON(direct_ipi == smp_processor_id());
> >                     ++vcpu->stat.guest_preempt;
> >             }
> > +
> > +           /*
> > +            * If the CPU is halted it will be waiting for a wake-up
> > +            */
> > +           if (waitqueue_active(&vcpu->irq.wq)) {
> >   
> 
> Why do the check?  The only reason I can see is to keep the stats 
> correct.  Otherwise we can do the body of the if unconditionally.

I agree the waitqueue_active() check is superfluous if you are only
waking up a wait_queue and nothing else.  However, one reason to have it
this way is the stat update (as you pointed out).  More importantly,
another reason is that we cannot use a simple wake_up here (I believe)
due to constraints of the different contexts that may invoke this
function.  Therefore we have to use wake_up_sync with a deferred
reschedule.   We could do these operations unconditionally, but having
an unnecessary reschedule seems like it would be wasteful at best.

> 
> > +                   wake_up_interruptible_sync(&vcpu->irq.wq);
> > +                   set_tsk_need_resched(current);
> >   
> 
> This is unneeded?  I'd expect wake_up_interruptible_sync() to take care 
> of any rescheduling needed.

IIUC, a regular wake_up() does two primary things:  1) it moves an item
from a wait_queue to the run_queue, and 2) it calls schedule.

wake_up_sync() only does (1) and expects that the caller will call
schedule() later on its own.  In our case, we cannot safely call
schedule() directly anywhere in the interrupt code because there are
multiple paths that can cause the apic to inject something, some of
which are not conducive to sleeping (for example hrtimer callback driven
LVTT interrupts).   Therefore, we mark the task as needing rescheduling
to defer scheduling before we return from this function, but to also
induce scheduling ASAP afterwards.

IIRC, if you remove the NEED_RESCHED, sometimes the LVTT will fail to
kick the vcpu out of halt until the next natural schedule point happens
orthogonal to this codepath.  This results in unexpected behavior in the
guest, and is what drove me to add it.


Regards,
-Greg


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to