Gregory Haskins wrote:
>>> +
>>> +struct kvm_irqdevice {
>>> +   int  (*ack)(struct kvm_irqdevice *this, int *vector);
>>> +   int  (*set_pin)(struct kvm_irqdevice *this, int pin, int level);
>>> +   int  (*summary)(struct kvm_irqdevice *this, void *data);
>>> +   void (*destructor)(struct kvm_irqdevice *this);
>>>   
>>>       
>> [do we actually need a virtual destructor?]
>>     
>
> I believe it is the right thing to do, yes.  The implementation of the 
> irqdevice destructor may be as simple as a kfree(), or could be arbitrarily 
> complex (don't forget that we will have multiple models..we already have 
> three: userint, kernint, and lapic.  There may also be i8259 and 
> i8259_cascaded in the future).
>
>   

Yes, but does it need to be a function pointer? IOW, is the point it is
called generic code or already irqdevice-specific?

>   
>>> +/**
>>> + * kvm_irqdevice_ack -  read and ack the highest priority vector from the 
>>>       
>> device
>>     
>>> + * @dev: The device
>>> + * @vector: Retrieves the highest priority pending vector
>>> + *                [ NULL = Dont ack a vector, just check pending status]
>>> + *                [ non- NULL = Pointer to recieve vector data (out only)]
>>> + *
>>> + * Description: Read the highest priority pending vector from the device, 
>>> + *              potentially invoking auto- EOI depending on device policy
>>> + *
>>> + * Returns: (int)
>>> + *   [ - 1 = failure]
>>> + *   [>=0 = bitmap as follows: ]
>>> + *         [ KVM_IRQACK_VALID   = vector is valid]
>>> + *         [ KVM_IRQACK_AGAIN   = more unmasked vectors are available]
>>> + *         [ KVM_IRQACK_TPRMASK = TPR masked vectors are blocked]
>>> + */
>>> +static inline int kvm_irqdevice_ack(struct kvm_irqdevice *dev, 
>>> +                                       int *vector)
>>> +{
>>> +   return dev- >ack(dev, vector);
>>> +}
>>>   
>>>       
>> This is an improvement over the previous patch, but I'm vaguely 
>> disturbed by the complexity of the return code. I don't have an 
>> alternative to suggest at this time, though.
>>     
>
> Would you prefer to see a by-ref flags field passed in coupled with a more 
> traditional return code?
>
>   

While I enjoy nitpicking on the names and types of parameters, my
concern here is the exploding number of combinations, each of which can
be used by the arch to hide bugs in.

Bugs in this code are going to be exceedingly hard to debug; they'll be
by nature non-repeatable and timing-sensitive, and as the OS that makes
heaviest use of the APIC and tends to crash at the slightest
mis-emulation is closed source, much of the debugging is done by staring
at the code.

We already have a report that about missing mouse clicks, which is
possibly caused by interrupt mis-emulation.  If you want to know exactly
why I'm worried about increasing complexity, try to debug it.

[Of course, complexity inevitably grows, and even when people remove
code and simplify things, usually it is in order to add even more code
and more complexity.  But I want to be on the right side of the
complexity/performance/flexibility/stability tradeoff.]
>
>   
>>> +/**
>>> + * kvm_irqdevice_set_intr -  invokes a registered INTR callback
>>> + * @dev: The device
>>> + * @pin: Identifies the pin to alter -  
>>> + *           [ KVM_IRQPIN_LOCALINT (default) -  an vector is pending on 
>>> this
>>> + *                                             device]
>>> + *           [ KVM_IRQPIN_EXTINT -  a vector is pending on an external 
>>>       
>> device]
>>     
>>> + *           [ KVM_IRQPIN_SMI -  system- management- interrupt pin]
>>> + *           [ KVM_IRQPIN_NMI -  non- maskable- interrupt pin
>>> + * @trigger: sensitivity [0 = edge, 1 = level]
>>> + * @val: [0 = deassert (ignored for edge- trigger), 1 = assert]
>>> + *
>>> + * Description: Invokes a registered INTR callback (if present).  This
>>> + *              function is meant to be used privately by a irqdevice 
>>> + *              implementation. 
>>> + *
>>> + * Returns: (void)
>>> + */
>>> +static inline void kvm_irqdevice_set_intr(struct kvm_irqdevice *dev,
>>> +                                     kvm_irqpin_t pin, int trigger,
>>> +                                     int val)
>>> +{
>>> +   struct kvm_irqsink *sink = &dev- >sink;
>>> +   if (sink- >set_intr)
>>> +           sink- >set_intr(sink, dev, pin, trigger, val);
>>> +}
>>>   
>>>       
>> Do you see more than one implementation for - >set_intr (e.g. for 
>> cascading)? If not, it can be de- pointered.
>>     
>
> Yeah, I definitely see more than one consumer.  Case in point, the kernint 
> module that was included in this series registers intr() handlers for its two 
> irqdevices (apic, and ext).  Also, if we end up having level-2 support we 
> will be using it even more for the cascaded i8259s
>   

Okay.

 

>>
>>> + *  have to use the new API
>>> + */
>>> +static inline int __kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
>>> +{
>>> +   int pending = __kvm_vcpu_irq_all_pending(vcpu);
>>> +
>>> +   if (test_bit(kvm_irqpin_localint, &pending) ||
>>> +       test_bit(kvm_irqpin_extint, &pending))
>>> +           return 1;
>>> +   
>>> +   return 0;
>>> +}
>>> +
>>> +static inline int kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu)
>>> +{
>>> +   int ret = 0;
>>> +   int flags;
>>> +
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> +   ret = __kvm_vcpu_irq_pending(vcpu);
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>>   
>>>       
>> The locking seems superfluous.
>>     
>
> I believe there are places where we need to call the locked version of 
> kvm_vcpu_irq_pending in the code, but I will review this to make sure.
>
>   

I meant, __kvm_vcpu_irq_pending is just reading stuff.

>   
>>> +
>>> +   return ret;
>>> +}
>>> +
>>> +static inline void __kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
>>> +{
>>> +   BUG_ON(vcpu- >irq.deferred != - 1); /* We can only hold one deferred */
>>> +
>>> +   vcpu- >irq.deferred = irq;
>>> +}
>>> +
>>> +static inline void kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq)
>>> +{
>>> +   int flags;
>>> +
>>> +   spin_lock_irqsave(&vcpu- >irq.lock, flags);
>>> +   __kvm_vcpu_irq_push(vcpu, irq);
>>> +   spin_unlock_irqrestore(&vcpu- >irq.lock, flags);
>>> +}
>>> +
>>>   
>>>       
>> Can you explain the logic behind push()/pop()? I realize you inherited 
>> it, but I don't think it fits well into the new model.
>>     
>
> It seems you have already figured this out in your later comments, but just 
> to make sure we are clear I will answer your question anyway:  The problem as 
> I see it is that real-world PICs have the notion of an interrupt being 
> accepted by the CPU during the acknowledgment cycle.  What happens during 
> that cycle is PIC dependent, but for something like an 8259 or LAPIC, 
> generally it means at least moving the pending bit from the IRR to the ISR 
> register.  Once the vector is acknowledged, it is considered dispatched to 
> the CPU.  However, for VMs this is not always an atomic operation (e.g. the 
> injection may fail under a certain set of circumstances such as those that 
> cause a VMEXIT before the injection is complete).  During those cases, we 
> don't want to lose the interrupt so something must be done to preserve our 
> current state for the next injection window.
>
> In the original KVM code, the vector was simply re-inserted back into the 
> (effective) userint model's state.  This solved the problem neatly albeit 
> potentially unnaturally when compared to the real-world.  When you introduce 
> the models of actual PICs things get more complex.  I had a choice between 
> somehow aborting the previously accepted vector, or adding a new layer 
> between the PIC and the vCPU (e.g. irq.deferred).  Since the real-world PICs 
> have no notion of "abort-ack", it would have been unnatural to add that 
> feature at that layer.  In addition, the operation would have to be supported 
> with each model.  The irq.deferred code works with all models and doesn't 
> require a hack to the emulation of the PIC(s).   It moves the problem to the 
> VCPU which is the layer where the difference is (PCPU vs VCPU).
>
>   

But, once the vcpu gets back to the deferred irq, the tpr may have
changed and no longer allow acceptance of this irq.

Thinking a bit about this, the current code suffers from the same
problem.  I guess it works because no OS is insane enough to page out
the IDT or GDT, so the only faults we can get are handled by kvm, not
the guest.

So it seems the correct description is not 'un-ack the interrupt', as we
have effectively acked it, but actually queue it pending host-only kvm
processing.  I'm not 100% sure that's the only case, though.

>   
>>>  static inline void clgi(void)
>>>  {
>>>     asm volatile (SVM_CLGI);
>>> @@ - 892,7 +874,12 @@ static int pf_interception(struct kvm_vcpu *vcpu, 
>>> struct 
>>>       
>> kvm_run *kvm_run)
>>     
>>>     int r;
>>>  
>>>     if (is_external_interrupt(exit_int_info))
>>> -           push_irq(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
>>> +           /*
>>> +            * An exception was taken while we were trying to inject an
>>> +            * IRQ.  We must defer the injection of the vector until
>>> +            * the next window.
>>> +            */
>>> +           kvm_vcpu_irq_push(vcpu, exit_int_info & SVM_EVTINJ_VEC_MASK);
>>>   
>>>       
>> Ah, I remember what push/pop is for now. We actually have - >ack() to 
>> deal with this now. Unfortunately with auto- eoi we don't have a good 
>> place to call it. So push() is a kind of unack() for eoi interrupts.
>>     
>
> Sort of.  I think my explanation above covers this, so I wont go into it 
> deeper here.
>
>   

Yeah.  Well, at least some of the uses are not unack() related, and we
can't really do unack(), so I was wrong.




-- 
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel

Reply via email to