hi -
so i looked into the mysteries around why our IO-APIC and virtio-cons
works the way it does. i did this mostly through printks and looking
through the codebase. with ssh and scp, the turn-around for a
recompiled and rebooted linux vm was about 20 seconds.
here goes:
------------------------------
without the ISORs, linux thinks the rest of the IOAPIC pins are
disconnected:
(ISORs set for 1-4, for bus 0)
source -> gsi
0 -> 2
1 -> 1
3 -> 3
4 -> 4
sorted by gsi
1 -> 1
0 -> 2
3 -> 3
4 -> 4
setup_IO_APIC_irqs():
%ENABLING IO-APIC IRQs
%init IO_APIC IRQs
% apic 0 pin 0 not connected
1-4 here! these must be IOAPIC output, not source irqs
% apic 0 pin 5 not connected
% apic 0 pin 6 not connected
% apic 0 pin 7 not connected
% apic 0 pin 8 not connected
% apic 0 pin 9 not connected
% apic 0 pin 10 not connected
% apic 0 pin 11 not connected
% apic 0 pin 12 not connected
% apic 0 pin 13 not connected
% apic 0 pin 14 not connected
% apic 0 pin 15 not connected
% apic 0 pin 16 not connected
% apic 0 pin 17 not connected
% apic 0 pin 18 not connected
% apic 0 pin 19 not connected
% apic 0 pin 20 not connected
% apic 0 pin 21 not connected
% apic 0 pin 22 not connected
% apic 0 pin 23 not connected
questions:
-----------------------
what specifically did linux see to think that?
find_irq_entry()
is it something we're doing wrong?
we didn't tell it about any PCI devices, or anything that might be
connected there.
we didn't set the MADT PC_COMPAT flag (tried it, had no effect)
find_irq_entry in this case looks for type mp_INT. that gets set in a
few places. we don't do mp tables, right? so let's look at the ACPI
places. one of them:
mp_override_legacy_irq()
happens 4 times, once for each ISOR. sets mp_INT, ISA_BUS, and a few
other things.
the other place is
mp_config_acpi_legacy_irqs()
which we call, but nr_legacy_irqs == 0, so we do nothing!
why is it 0? because our hacks hard-coded it that way:
struct legacy_pic *legacy_pic = &null_legacy_pic; //&default_legacy_pic;
k, so no legacy pic means no default routes for any of the ioapic pins.
another place mp_INT is set is mpparse.c, but we don't call that
location. i didn't see any other places that set mp_INT.
so it looks like the IOAPIC isn't connected to anything. no legacy
(thanks to us), thus no default 1:1 mappings. then the only pins that
are connected are what we explicitly told it via the ISORs.
k, that answers that question. we need to wire the IOAPIC somehow.
now we told it that 4 pins are connected. in our current
setup, that is just:
source -> gsi
0 -> 2
1 -> 1
3 -> 3
4 -> 4
also, the variable 'pin' refers to the IOAPICs pins since we had pins
1-4 connected. if pin referred to source_irq, then pin 2 would not be
connected and pin 0 would be. that wouldn't really make sense anyways,
since the IOAPIC doesn't know about overrides. it just knows its input
pins and output vectors. ACPI tells us how it is all wired.
-------------------------
why is it that IRQ 26 gets mapped to slot 3 (the fourth slot)? given
what we know now, the IOAPIC only has its pins 1-4 set.
each of them gets pin_2_irq() (ret = the new IRQ):
%XME pin_2_irq idx 1, pin 1, gsi 1, ret 24
%XME pin_2_irq idx 0, pin 2, gsi 2, ret 25
%XME pin_2_irq idx 2, pin 3, gsi 3, ret 26
%XME pin_2_irq idx 3, pin 4, gsi 4, ret 27
at first, i thought idx was the bus's source_irq from the ISORs. but
that's 0-3, not {0, 1, 3, 4}. what the heck?
idx comes from find_irq_entry(). let's look closer at the setup:
%ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
%Int: type 0, pol 3, trig 1, bus 00, IRQ 00, APIC ID 0, APIC INT 02
^ this might be due to special casing of the timer
%ACPI: INT_SRC_OVR (bus 0 bus_irq 1 global_irq 1 dfl dfl)
%Int: type 0, pol 0, trig 0, bus 00, IRQ 01, APIC ID 0, APIC INT 01
%ACPI: INT_SRC_OVR (bus 0 bus_irq 3 global_irq 3 dfl dfl)
%Int: type 0, pol 0, trig 0, bus 00, IRQ 03, APIC ID 0, APIC INT 03
%ACPI: INT_SRC_OVR (bus 0 bus_irq 4 global_irq 4 dfl dfl)
%Int: type 0, pol 0, trig 0, bus 00, IRQ 04, APIC ID 0, APIC INT 04
'idx' is just the index in the mp_irqs[] array. nothing to do with
the actual overrides, which are fine.
oh, and pin == GSI in pin_2_irq() because this IOAPIC's gsi base = 0.
anyway, ret is the desc we get back, e.g. when virtio-cons does a
request_irq.
ret came from mp_map_pin_to_irq()
alloc_irq_from_domain()
__irq_domain_alloc_irqs()
virq = irq_domain_alloc_descs()
irq_alloc_descs_from, various macros:
__irq_alloc_descs()
in there, we have this block:
/*
* For interrupts which are freely allocated the
* architecture can force a lower bound to the @from
* argument. x86 uses this to exclude the GSI space.
*/
from = arch_dynirq_lower_bound(from);
that ends up setting the lowest IRQ to 24. from there, we just
allocate whichever irq is free, starting at 24. 24, then 25, then 26,
etc.
that 24 came from:
return ioapic_initialized ? ioapic_dynirq_base : gsi_top;
and it was gsi_top, which is set in mp_register_ioapic(), where it
looks like it is set to the IOAPIC's pins + GSI base + 1, so basically
the next IRQ available.
also in mp_map_pin_to_irq(), it looks like there is a case for
legacy/non-PCI IRQs, where we call alloc_isa_irq_from_domain() instead
of alloc_irq_from_domain(). (this is based on the nr_legacy_irqs,
which we set to 0 by our hack).
so anyway, here's what happens (probably):
-----------------------
1) we tell linux it has an ioapic, but it was hacked to not think it
was a legacy ioapic, thus it has no default pins.
2) we give it ISORs, so it sets up *those* pins on the IOAPIC. that's
all the devices that linux knows about.
3) when the IOAPIC inits, it needs Linux IRQs for each of the IOAPIC
pins. it calls mp_map_pin_to_irq.
4) since we hacked out the legacy support, those IOAPIC pins aren't
given their "usual" IRQ. they are now given IRQs starting at 24.
linux thinks those legacy GSIs are busy, and won't hand them out to
non-legacy interrupts
5) when linux tries to set the vector for these IRQs, it points back to
its original IOAPIC pin.
for example, IRQ 26 (virtio-cons) is IO-APIC 0, index 2
%XME pin_2_irq idx 2, pin 3, gsi 3, ret 26
recall that this was the third override (number 2), not the
source_irq. so this override was source 3 -> pin 3. not that source
matters much, you want to mess with the IOAPIC pin 3 regardless. the
ISOR just said that "what you thought was at source is wired to IOAPIC
pin destination".
so when linux tries to set the vector, it goes to the IOREDTBL entry
for pin3. AKA, 0x16 in the IOAPIC space.
the only reason virtio-cons got 26 was that we told it to use it. it
wasn't a global system irq. it happened to be the number that linux
assigned internally for the ISOR IRQs we gave it. basically, we tell
linux it has an IOAPIC pin set by having an override with global_irq =
the pin for that IOAPIC. (e.g., global_irq = 0 means pin 0 is set for
IOAPIC 0). then we lined up what we told virtio-cons so that it would
be tricked into using the IOAPIC for pin 0. for example, this patch
works for me: (just one override for 0, and subtract 24, not 23).
diff --git a/tests/vmm/vmrunkernel.c b/tests/vmm/vmrunkernel.c
index d00c441a7ae6..c043e8defdd4 100644
--- a/tests/vmm/vmrunkernel.c
+++ b/tests/vmm/vmrunkernel.c
@@ -116,16 +116,7 @@ struct acpi_madt_interrupt_override isor[] = {
*/
{.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
.length = sizeof(struct acpi_madt_interrupt_override)},
- .bus = 0, .source_irq = 0, .global_irq = 2, .inti_flags = 0},
- {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
- .length = sizeof(struct acpi_madt_interrupt_override)},
- .bus = 0, .source_irq = 1, .global_irq = 1, .inti_flags = 0},
- {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
- .length = sizeof(struct acpi_madt_interrupt_override)},
- .bus = 0, .source_irq = 3, .global_irq = 3, .inti_flags = 0},
- {.header = {.type = ACPI_MADT_TYPE_INTERRUPT_OVERRIDE,
- .length = sizeof(struct acpi_madt_interrupt_override)},
- .bus = 0, .source_irq = 4, .global_irq = 4, .inti_flags = 0},
+ .bus = 0, .source_irq = 0, .global_irq = 0, .inti_flags = 0},
};
@@ -195,7 +186,7 @@ static struct virtio_mmio_dev cons_mmio_dev = {
.poke_guest = virtio_poke_guest,
/* At the moment irq numbers cannot be below 24; this is a problem with
* the IOAPIC and Interrupt Source Override Structure. */
- .irq = 26,
+ .irq = 24,
};
static struct virtio_console_config cons_cfg;
diff --git a/user/vmm/ioapic.c b/user/vmm/ioapic.c
index 3f6d58caad6a..c515d067bf7f 100644
--- a/user/vmm/ioapic.c
+++ b/user/vmm/ioapic.c
@@ -110,7 +110,7 @@ static void ioapic_write(struct guest_thread *vm_thread,
int ix,
/* The first IRQ register starts at 0x10, and there are two
32-bit
* registers for each IRQ. The first 8 bits of the value
assigned to
* 'reg' is the interrupt vector. */
- irqreg = (vm->virtio_mmio_devices[i]->irq - IOAPIC_LAST_IRQ) *
2 + 0x10;
+ irqreg = (vm->virtio_mmio_devices[i]->irq - 24) * 2 + 0x10;
if (reg == irqreg && (value & 0xff) != 0) {
vm->virtio_mmio_devices[i]->vec = value & 0xff;
DPRINTF("irq vector for irq number %d is: %lx\n",
this seems super screwy, and will break if / when we change that
null_legacy_pic business.
say for instance that we actually supported the legacy pic. what is
the appropriate way to express IRQ vectors for virtio? my guess is
virtio-pci would use the usual PCI irq range (16-23), and since we are
using legacy irqs, that would get GSI == IRQ (for GSI < 24). unlike
what we have now, where GSI == 0 and IRQ == 26.
what does virtio-mmio do for interrupt/vectors?
http://lxr.free-electrons.com/source/Documentation/virtual/virtio-spec.txt?v=3.8
(grep mmio) has some stuff, but seems to default to PCI for things not
listed in Appendix X. perhaps that includes IRQ/vector setup.
--
You received this message because you are subscribed to the Google Groups
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.