we need to get rid of that null_legacy_pic. I think we're ready to see how much that breaks. I turned it off because I was so confused, but we're working our way to a patch free kernel and this is a good next step.
On Wed, Jun 22, 2016 at 11:20 AM Barret Rhoden <[email protected]> wrote: > hi - > > so i looked into the mysteries around why our IO-APIC and virtio-cons > works the way it does. i did this mostly through printks and looking > through the codebase. with ssh and scp, the turn-around for a > recompiled and rebooted linux vm was about 20 seconds. > > here goes: > > ------------------------------ > > without the ISORs, linux thinks the rest of the IOAPIC pins are > disconnected: > > (ISORs set for 1-4, for bus 0) > source -> gsi > 0 -> 2 > 1 -> 1 > 3 -> 3 > 4 -> 4 > > sorted by gsi > 1 -> 1 > 0 -> 2 > 3 -> 3 > 4 -> 4 > > > setup_IO_APIC_irqs(): > > %ENABLING IO-APIC IRQs > %init IO_APIC IRQs > % apic 0 pin 0 not connected > > 1-4 here! these must be IOAPIC output, not source irqs > > % apic 0 pin 5 not connected > % apic 0 pin 6 not connected > % apic 0 pin 7 not connected > % apic 0 pin 8 not connected > % apic 0 pin 9 not connected > % apic 0 pin 10 not connected > % apic 0 pin 11 not connected > % apic 0 pin 12 not connected > % apic 0 pin 13 not connected > % apic 0 pin 14 not connected > % apic 0 pin 15 not connected > % apic 0 pin 16 not connected > % apic 0 pin 17 not connected > % apic 0 pin 18 not connected > % apic 0 pin 19 not connected > % apic 0 pin 20 not connected > % apic 0 pin 21 not connected > % apic 0 pin 22 not connected > % apic 0 pin 23 not connected > > questions: > ----------------------- > what specifically did linux see to think that? > > find_irq_entry() > > is it something we're doing wrong? > > we didn't tell it about any PCI devices, or anything that might be > connected there. > > we didn't set the MADT PC_COMPAT flag (tried it, had no effect) > > find_irq_entry in this case looks for type mp_INT. that gets set in a > few places. we don't do mp tables, right? so let's look at the ACPI > places. one of them: > > mp_override_legacy_irq() > > happens 4 times, once for each ISOR. sets mp_INT, ISA_BUS, and a few > other things. > > the other place is > > mp_config_acpi_legacy_irqs() > > which we call, but nr_legacy_irqs == 0, so we do nothing! > > why is it 0? because our hacks hard-coded it that way: > > struct legacy_pic *legacy_pic = &null_legacy_pic; //&default_legacy_pic; > > k, so no legacy pic means no default routes for any of the ioapic pins. > > another place mp_INT is set is mpparse.c, but we don't call that > location. i didn't see any other places that set mp_INT. > > so it looks like the IOAPIC isn't connected to anything. no legacy > (thanks to us), thus no default 1:1 mappings. then the only pins that > are connected are what we explicitly told it via the ISORs. > > k, that answers that question. we need to wire the IOAPIC somehow. > now we told it that 4 pins are connected. in our current > setup, that is just: > > source -> gsi > 0 -> 2 > 1 -> 1 > 3 -> 3 > 4 -> 4 > > also, the variable 'pin' refers to the IOAPICs pins since we had pins > 1-4 connected. if pin referred to source_irq, then pin 2 would not be > connected and pin 0 would be. that wouldn't really make sense anyways, > since the IOAPIC doesn't know about overrides. it just knows its input > pins and output vectors. ACPI tells us how it is all wired. > > > ------------------------- > why is it that IRQ 26 gets mapped to slot 3 (the fourth slot)? given > what we know now, the IOAPIC only has its pins 1-4 set. > > each of them gets pin_2_irq() (ret = the new IRQ): > > %XME pin_2_irq idx 1, pin 1, gsi 1, ret 24 > %XME pin_2_irq idx 0, pin 2, gsi 2, ret 25 > %XME pin_2_irq idx 2, pin 3, gsi 3, ret 26 > %XME pin_2_irq idx 3, pin 4, gsi 4, ret 27 > > at first, i thought idx was the bus's source_irq from the ISORs. but > that's 0-3, not {0, 1, 3, 4}. what the heck? > > idx comes from find_irq_entry(). let's look closer at the setup: > > %ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > %Int: type 0, pol 3, trig 1, bus 00, IRQ 00, APIC ID 0, APIC INT 02 > ^ this might be due to special casing of the timer > > %ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 dfl dfl) > %Int: type 0, pol 0, trig 0, bus 00, IRQ 01, APIC ID 0, APIC INT 01 > > %ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 3 dfl dfl) > %Int: type 0, pol 0, trig 0, bus 00, IRQ 03, APIC ID 0, APIC INT 03 > > %ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 4 dfl dfl) > %Int: type 0, pol 0, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 04 > > 'idx' is just the index in the mp_irqs[] array. nothing to do with > the actual overrides, which are fine. > > oh, and pin == GSI in pin_2_irq() because this IOAPIC's gsi base = 0. > > anyway, ret is the desc we get back, e.g. when virtio-cons does a > request_irq. > > ret came from mp_map_pin_to_irq() > alloc_irq_from_domain() > __irq_domain_alloc_irqs() > virq = irq_domain_alloc_descs() > irq_alloc_descs_from, various macros: > __irq_alloc_descs() > > in there, we have this block: > > /* > * For interrupts which are freely allocated the > * architecture can force a lower bound to the @from > * argument. x86 uses this to exclude the GSI space. > */ > from = arch_dynirq_lower_bound(from); > > that ends up setting the lowest IRQ to 24. from there, we just > allocate whichever irq is free, starting at 24. 24, then 25, then 26, > etc. > > that 24 came from: > return ioapic_initialized ? ioapic_dynirq_base : gsi_top; > > and it was gsi_top, which is set in mp_register_ioapic(), where it > looks like it is set to the IOAPIC's pins + GSI base + 1, so basically > the next IRQ available. > > also in mp_map_pin_to_irq(), it looks like there is a case for > legacy/non-PCI IRQs, where we call alloc_isa_irq_from_domain() instead > of alloc_irq_from_domain(). (this is based on the nr_legacy_irqs, > which we set to 0 by our hack). > > > so anyway, here's what happens (probably): > ----------------------- > 1) we tell linux it has an ioapic, but it was hacked to not think it > was a legacy ioapic, thus it has no default pins. > > 2) we give it ISORs, so it sets up *those* pins on the IOAPIC. that's > all the devices that linux knows about. > > 3) when the IOAPIC inits, it needs Linux IRQs for each of the IOAPIC > pins. it calls mp_map_pin_to_irq. > > 4) since we hacked out the legacy support, those IOAPIC pins aren't > given their "usual" IRQ. they are now given IRQs starting at 24. > linux thinks those legacy GSIs are busy, and won't hand them out to > non-legacy interrupts > > 5) when linux tries to set the vector for these IRQs, it points back to > its original IOAPIC pin. > > for example, IRQ 26 (virtio-cons) is IO-APIC 0, index 2 > > %XME pin_2_irq idx 2, pin 3, gsi 3, ret 26 > > recall that this was the third override (number 2), not the > source_irq. so this override was source 3 -> pin 3. not that source > matters much, you want to mess with the IOAPIC pin 3 regardless. the > ISOR just said that "what you thought was at source is wired to IOAPIC > pin destination". > > so when linux tries to set the vector, it goes to the IOREDTBL entry > for pin3. AKA, 0x16 in the IOAPIC space. > > the only reason virtio-cons got 26 was that we told it to use it. it > wasn't a global system irq. it happened to be the number that linux > assigned internally for the ISOR IRQs we gave it. basically, we tell > linux it has an IOAPIC pin set by having an override with global_irq = > the pin for that IOAPIC. (e.g., global_irq = 0 means pin 0 is set for > IOAPIC 0). then we lined up what we told virtio-cons so that it would > be tricked into using the IOAPIC for pin 0. for example, this patch > works for me: (just one override for 0, and subtract 24, not 23). > > diff --git a/tests/vmm/vmrunkernel.c b/tests/vmm/vmrunkernel.c > index d00c441a7ae6..c043e8defdd4 100644 > --- a/tests/vmm/vmrunkernel.c > +++ b/tests/vmm/vmrunkernel.c > @@ -116,16 +116,7 @@ struct acpi_madt_interrupt_override isor[] = { > */ > {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, > .length = sizeof(struct acpi_madt_interrupt_override)}, > - .bus = 0, .source_irq = 0, .global_irq = 2, .inti_flags = 0}, > - {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, > - .length = sizeof(struct acpi_madt_interrupt_override)}, > - .bus = 0, .source_irq = 1, .global_irq = 1, .inti_flags = 0}, > - {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, > - .length = sizeof(struct acpi_madt_interrupt_override)}, > - .bus = 0, .source_irq = 3, .global_irq = 3, .inti_flags = 0}, > - {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE, > - .length = sizeof(struct acpi_madt_interrupt_override)}, > - .bus = 0, .source_irq = 4, .global_irq = 4, .inti_flags = 0}, > + .bus = 0, .source_irq = 0, .global_irq = 0, .inti_flags = 0}, > }; > > > @@ -195,7 +186,7 @@ static struct virtio_mmio_dev cons_mmio_dev = { > .poke_guest = virtio_poke_guest, > /* At the moment irq numbers cannot be below 24; this is a problem > with > * the IOAPIC and Interrupt Source Override Structure. */ > - .irq = 26, > + .irq = 24, > }; > > static struct virtio_console_config cons_cfg; > diff --git a/user/vmm/ioapic.c b/user/vmm/ioapic.c > index 3f6d58caad6a..c515d067bf7f 100644 > --- a/user/vmm/ioapic.c > +++ b/user/vmm/ioapic.c > @@ -110,7 +110,7 @@ static void ioapic_write(struct guest_thread > *vm_thread, int ix, > /* The first IRQ register starts at 0x10, and there are > two 32-bit > * registers for each IRQ. The first 8 bits of the value > assigned to > * 'reg' is the interrupt vector. */ > - irqreg = (vm->virtio_mmio_devices[i]->irq - > IOAPIC_LAST_IRQ) * 2 + 0x10; > + irqreg = (vm->virtio_mmio_devices[i]->irq - 24) * 2 + 0x10; > if (reg == irqreg && (value & 0xff) != 0) { > vm->virtio_mmio_devices[i]->vec = value & 0xff; > DPRINTF("irq vector for irq number %d is: %lx\n", > > > this seems super screwy, and will break if / when we change that > null_legacy_pic business. > > say for instance that we actually supported the legacy pic. what is > the appropriate way to express IRQ vectors for virtio? my guess is > virtio-pci would use the usual PCI irq range (16-23), and since we are > using legacy irqs, that would get GSI == IRQ (for GSI < 24). unlike > what we have now, where GSI == 0 and IRQ == 26. > > what does virtio-mmio do for interrupt/vectors? > > http://lxr.free-electrons.com/source/Documentation/virtual/virtio-spec.txt?v=3.8 > (grep mmio) has some stuff, but seems to default to PCI for things not > listed in Appendix X. perhaps that includes IRQ/vector setup. > > -- > You received this message because you are subscribed to the Google Groups > "Akaros" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Akaros" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. For more options, visit https://groups.google.com/d/optout.
