On Thu, 2007-06-21 at 21:02 +0800, Dong, Eddie wrote: > Gregory Haskins wrote: > > On Wed, 2007-06-20 at 15:43 +0800, Dong, Eddie wrote: > >> As we discussed, if we move APIC to kernel while leaving PIC in user > >> level, we have ABI level holes if the interrupt a user level qemu > >> injected is not immediately injected to guest for some reason. \ > > > > Hi Eddie, > > > > I know you worked hard on this, but I still do not agree with this > > statement. I agree that v9 introduced an issue related to a) > > premature IRR->ISR assignment, and b) lack of support for clearing > > pending ISR with IMR changes (and thank you for pointing it out!). > > Achitectually not only that. A premature IRR->ISR will cause guest OS > confuse in many place. A guest (say BIOS) may turn from interrupt > enabled mode to polling mode which polls IRR to detect if > there is pending IRQ to handle. In this case we have trouble.
I completely agree. But what I am saying is that I can eliminate the premature IRR->ISR with the change I am proposing. > > > > However, as I mentioned we can fix that issue with just a few new > > lines of > > code and by > > reverting the userspace injection model to the one used in prior > > releases without having to implement an entire in-kernel PIC to do it. > > In-kernel PIC gives us a full in kernel irqchip solution and performance > gain. Some OSes may use PIC only. Agreed. I will answer this down below with the level-0/1/2 question since they are related. > > > > > I think moving the PIC into the kernel has the advantage of allowing > > us to move the PIT into the kernel too (which is huge, IMHO), but > > Not very big, I just want to do one by one. we have done the code > in Xen already. > > > that is its sole advantage. Don't get me wrong: I am in favor of > > doing it. However, I wanted to point out that this particular > > solution to the problem you found essentially is invalidating > > "level-1" mode by only supporting level-0 or level-2, not fixing it. > > Not exactly understand level-0/1/2 meaning? Do u mean we mixed > irqchip mode is a feature requirement? I didn't see the necessity, > maybe I am wrong. But isn't both PIC and APIC in kernel much > nature and simple for us? At one point there was a debate between moving just the LAPIC, or moving everything (LAPIC/PIC/IOAPIC/PIT). Avi suggested that we start with just the LAPIC and see where we were at. Because we need to remain forward and backwards ABI compatible, I added an ioctl for setting the in-kernel-pic level. "0" is backwards compatible mode (everything in userspace). "1" is LAPIC only. "2" is presumable everything in the kernel, but today only 0 and 1 are the only two defined so there could be more than 3 levels someday if we wanted. > > If you really think supporting mixed irqchip mode is a must I will leave it to Avi to decide since he implicitly suggested it. But suffice to say, if we *dont* want to support it we will need to get the other in-kernel stuff into the lapic branch in its entirety before it can be merged so we dont create an ABI issue. > , we > need to introduce an intack I/F to notify user level irqchip if the irq > is really ACKed or not. We can do that but I doubt the necessity. This is where we disagree, as I know you have mentioned this before. The solution I am proposing (which reverts the userspace injection method) would not require this ACK that you mention. The reason is that the synchronous nature of the injection takes care of things naturally. Here is how: With sync-injection, we can only move a vector from the IRR->ISR if RFLAGS.IF and !irq.pending. Once we inject, the vector will move to ISR and get queued in kernel (thereby setting an irq.pending bit). It may not inject to the guest immediately, but thats ok because no new vectors will be acked in the PIC until that previous vector is processed (because irq.pending will remain set). Therefore, we will never prematurely move IRR->ISR until the previous vector really is "in-service". Does this make sense? My suggestion assumes that a single vector can stay in ISR without actually being in service without ill effect (Todays v9 code could presumably have multiple vectors in ISR without being in service, which I admit is bogus). If this assumption is false, then I agree with you that we need extra intack I/F. What are you thoughts here? > > > Perhaps we are "ok" with that, but I was under the distinct > > impression from Avi et. al. that these variable levels of support > > were a design goal for debugging purposes. > > > > I would prefer that we just fix level-1 mode with the changes I > > mentioned instead of just disabling it (in addition to adding level-2 > > mode as Eddie is working on). > > > >> > >> This patch fixes this by introducing new VM level IOCTL > >> KVM_SET_ISA_IRQ_LEVEL & KVM_CREATE_PIC to avoid the ABI hole. The > >> original KVM_INTERRUPT is still there to be backward compatible. > >> > >> WIth this patch, old Qemu, new qemu (after this patch), old kvm, new > >> kvm can work in any combination. Both user level code and kernel > >> code will automatically decide the irq source base on irqchip > >> location (user or kernel). > >> > >> Some known issues (TODO): > >> 1: SVM support is on going. > >> The code for VMX to inject irq is same with Xen now since the > >> situation is same now (irqchip in hyprevisor), SVM code should be > >> able to directly reuse from Xen too. > >> 2: live migration break. > >> 3: Temply APIC support is removed in CPUID to wait for the merge > >> of in kernel APIC patch > > > > Note that you will need more than just the APIC patch. My patch only > > moves the LAPIC down. The IOAPIC and 8259s are still in userspace. > > Your patch only moves the 8259s down. This means there is a gap where > > the IOAPIC used to be that still needs to be addressed. > > Why do u think I/O APIC must be moved down too? A new IOCTL like > KVM_IOAPIC_MESSAGE can solve the interface issue IMO and quit > natual. I suppose, but it somewhat defeats the purpose IMO. Every pin in the 8259 that gets tickled implicitly means an IOAPIC pin was tickled also. Do we really want to go to userspace for that? Essentially what happens then is the PIT ends up having to go through userspace for every tick at that point. On the flip side, the IOAPIC model is very simple so I think it makes more sense to move it one to one with the PIC. Regards, -Greg ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel