Hi Avi, Sorry for the delay. I have been traveling this week. See inline...
>>> On Tue, Apr 24, 2007 at 5:09 AM, in message <[EMAIL PROTECTED]>, Avi Kivity <[EMAIL PROTECTED]> wrote: > Gregory Haskins wrote: >>>> + >>>> +struct kvm_irqdevice { >>>> + int (*ack)(struct kvm_irqdevice *this, int *vector); >>>> + int (*set_pin)(struct kvm_irqdevice *this, int pin, int level); >>>> + int (*summary)(struct kvm_irqdevice *this, void *data); >>>> + void (*destructor)(struct kvm_irqdevice *this); >>>> >>>> >>> [do we actually need a virtual destructor?] >>> >> >> I believe it is the right thing to do, yes. The implementation of the > irqdevice destructor may be as simple as a kfree(), or could be arbitrarily > complex (don't forget that we will have multiple models..we already have > three: userint, kernint, and lapic. There may also be i8259 and > i8259_cascaded in the future). >> >> > > Yes, but does it need to be a function pointer? IOW, is the point it is > called generic code or already irqdevice- specific? The code can-be/is irqdevice specific, thus the virtual. In some cases, it will be as simple as a kfree(). In others, (kernint, for instance), it might need to drop references to the apic/ext devices and do other cleanup (which reminds me that I should look at this to make sure its done right today) ;) > >> >>>> +/** >>>> + * kvm_irqdevice_ack - read and ack the highest priority vector from >>>> the >>>> >>> device >>> >>>> + * @dev: The device >>>> + * @vector: Retrieves the highest priority pending vector >>>> + * [ NULL = Dont ack a vector, just check pending status] >>>> + * [ non- NULL = Pointer to recieve vector data (out >>>> only)] >>>> + * >>>> + * Description: Read the highest priority pending vector from the device, >>>> + * potentially invoking auto- EOI depending on device policy >>>> + * >>>> + * Returns: (int) >>>> + * [ - 1 = failure] >>>> + * [>=0 = bitmap as follows: ] >>>> + * [ KVM_IRQACK_VALID = vector is valid] >>>> + * [ KVM_IRQACK_AGAIN = more unmasked vectors are available] >>>> + * [ KVM_IRQACK_TPRMASK = TPR masked vectors are blocked] >>>> + */ >>>> +static inline int kvm_irqdevice_ack(struct kvm_irqdevice *dev, >>>> + int *vector) >>>> +{ >>>> + return dev- >ack(dev, vector); >>>> +} >>>> >>>> >>> This is an improvement over the previous patch, but I'm vaguely >>> disturbed by the complexity of the return code. I don't have an >>> alternative to suggest at this time, though. >>> >> >> Would you prefer to see a by- ref flags field passed in coupled with a more > traditional return code? >> >> > > While I enjoy nitpicking on the names and types of parameters, my > concern here is the exploding number of combinations, each of which can > be used by the arch to hide bugs in. > > Bugs in this code are going to be exceedingly hard to debug; they'll be > by nature non- repeatable and timing- sensitive, and as the OS that makes > heaviest use of the APIC and tends to crash at the slightest > mis- emulation is closed source, much of the debugging is done by staring > at the code. > > We already have a report that about missing mouse clicks, which is > possibly caused by interrupt mis- emulation. If you want to know exactly > why I'm worried about increasing complexity, try to debug it. > > [Of course, complexity inevitably grows, and even when people remove > code and simplify things, usually it is in order to add even more code > and more complexity. But I want to be on the right side of the > complexity/performance/flexibility/stability tradeoff.] We are on the same page here. I have and will continue to strive to make design choices here that are sensitive to these and other similar issues. As always, comments on ways to improve these choices are always welcome. > >>> >>>> + * have to use the new API >>>> + */ >>>> +static inline int __kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu) >>>> +{ >>>> + int pending = __kvm_vcpu_irq_all_pending(vcpu); >>>> + >>>> + if (test_bit(kvm_irqpin_localint, &pending) || >>>> + test_bit(kvm_irqpin_extint, &pending)) >>>> + return 1; >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static inline int kvm_vcpu_irq_pending(struct kvm_vcpu *vcpu) >>>> +{ >>>> + int ret = 0; >>>> + int flags; >>>> + >>>> + spin_lock_irqsave(&vcpu- >irq.lock, flags); >>>> + ret = __kvm_vcpu_irq_pending(vcpu); >>>> + spin_unlock_irqrestore(&vcpu- >irq.lock, flags); >>>> >>>> >>> The locking seems superfluous. >>> >> >> I believe there are places where we need to call the locked version of > kvm_vcpu_irq_pending in the code, but I will review this to make sure. >> >> > > I meant, __kvm_vcpu_irq_pending is just reading stuff. Ah, I see. I am not 100% sure about this, but I think you can make the same argument here as you can with that "double check locks are broken" article that you sent out. If I got anything out of that article (it was very interesting, BTW), its that the locks do more than protect critical sections: They are an implicit memory barrier also. I am under the impression that we want that behavior here. I can be convinced otherwise.... > >> >>>> + >>>> + return ret; >>>> +} >>>> + >>>> +static inline void __kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq) >>>> +{ >>>> + BUG_ON(vcpu- >irq.deferred != - 1); /* We can only hold one deferred >>>> */ >>>> + >>>> + vcpu- >irq.deferred = irq; >>>> +} >>>> + >>>> +static inline void kvm_vcpu_irq_push(struct kvm_vcpu *vcpu, int irq) >>>> +{ >>>> + int flags; >>>> + >>>> + spin_lock_irqsave(&vcpu- >irq.lock, flags); >>>> + __kvm_vcpu_irq_push(vcpu, irq); >>>> + spin_unlock_irqrestore(&vcpu- >irq.lock, flags); >>>> +} >>>> + >>>> >>>> >>> Can you explain the logic behind push()/pop()? I realize you inherited >>> it, but I don't think it fits well into the new model. >>> >> >> It seems you have already figured this out in your later comments, but just > to make sure we are clear I will answer your question anyway: The problem as > I see it is that real- world PICs have the notion of an interrupt being > accepted by the CPU during the acknowledgment cycle. What happens during > that cycle is PIC dependent, but for something like an 8259 or LAPIC, > generally it means at least moving the pending bit from the IRR to the ISR > register. Once the vector is acknowledged, it is considered dispatched to > the CPU. However, for VMs this is not always an atomic operation (e.g. the > injection may fail under a certain set of circumstances such as those that > cause a VMEXIT before the injection is complete). During those cases, we > don't want to lose the interrupt so something must be done to preserve our > current state for the next injection window. >> >> In the original KVM code, the vector was simply re- inserted back into the > (effective) userint model's state. This solved the problem neatly albeit > potentially unnaturally when compared to the real- world. When you introduce > the models of actual PICs things get more complex. I had a choice between > somehow aborting the previously accepted vector, or adding a new layer > between the PIC and the vCPU (e.g. irq.deferred). Since the real- world PICs > have no notion of "abort- ack", it would have been unnatural to add that > feature at that layer. In addition, the operation would have to be supported > with each model. The irq.deferred code works with all models and doesn't > require a hack to the emulation of the PIC(s). It moves the problem to the > VCPU which is the layer where the difference is (PCPU vs VCPU). >> >> > > But, once the vcpu gets back to the deferred irq, the tpr may have > changed and no longer allow acceptance of this irq. True, but I am not convinced this is a problem. (see below) > > Thinking a bit about this, the current code suffers from the same > problem. Right > I guess it works because no OS is insane enough to page out > the IDT or GDT, so the only faults we can get are handled by kvm, not > the guest. This is my thinking as well. The conditions that cause an injection failure are probably relatively light-weight w.r.t. the guests execution context. Like for instance, maybe an NMI comes in during the VMENTRY and causes an immediate VMEXIT (e.g. the guest never made any forward progress, and therefore nothing else (e.g. TPR) has changed) > > So it seems the correct description is not 'un- ack the interrupt', as we > have effectively acked it, but actually queue it pending host- only kvm > processing. This is exactly what I have done (if I understood what you were saying). When the injection fails we push the vector to the irq.deferred entry which takes a higher priority in the queue than the backing irqdevice (since it believes the vector is already dispatched). > I'm not 100% sure that's the only case, though. Yeah, me either. Lets hope so for now and we can address it when something comes along to reveal that this was an incorrect assumption. Otherwise we could be doing something ugly to the emulation for no good reason. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel