Re: [akaros] Code Review - IOMMU and X2APIC support

Barret Rhoden Fri, 15 Jan 2016 14:37:40 -0800

Hi -

Thanks for the fixups.  It's coming along nicely.  Comments below:


On 2016-01-12 at 09:30 [email protected] wrote:
> Hi, I've made some changes to the virtualization support. This should
> work coupled with some changes in the vmm and a kernel cmdline
> parameter. As of now, I have linux working up to busybox while using
> the x2apic.

> From ca8d9d9544555ad1b692929d8c76e5f6ce64d934 Mon Sep 17 00:00:00 2001
> From: GanShun <[email protected]>
> Date: Wed, 16 Dec 2015 17:36:39 -0800
> Subject: Enabling X2APIC

>  static inline void send_self_ipi(uint8_t vector)
>  {
> -     write_mmreg32(LAPIC_IPI_ICR_LOWER, 0x00044000 | vector);
> -     lapic_wait_to_send();
> +     // TODO (ganshun): change to the X2APIC method
> +     apicsendipi(0x0000000000044000ULL | vector);
>  }

Can we do this now?  Perhaps just add LAPIC_SELF_IPI 0x83f (or base +
3f?) and wrmsr to it.


> --- a/kern/arch/x86/apic9.c
> +++ b/kern/arch/x86/apic9.c
> +static void __apic_ir_dump(uint64_t r);
> +
> +static void __apic_ir_dump(uint64_t r)
>  {
> +     int i;
>       uint32_t val;
> -     if (!apicbase)
> -             panic("apicrget: no apic");
> -     val = read_mmreg32(apicbase + r);
> +
> +     for (i = 1; i < 8; i++) {

Shouldn't this be i = 0?  o/w, we don't dump the first 32 bits of the
IRR/ISR.

> +             val = apicrget(r+i);
> +             if (val) {
> +                     printk("Register at range (%d,%d]: 0x%08x\n", (i*2+32),
> +                            i*2, val);
> +             }
> +     }
> +}
> +void apic_isr_dump(void)
> +{
> +     printk("ISR DUMP\n");
> +     __apic_ir_dump(0x10);
> +}
> +void apic_irr_dump(void)
> +{
> +     printk("IRR DUMP\n");
> +     __apic_ir_dump(0x20);
> +}

These look like debugging functions, but we should still change 0x10 and
0x20 be LAPIC_ISR and LAPIC_IRR.


> +uint32_t apicrget(uint64_t r)
> +{
> +     uint32_t val;
> +
> +     if (r >= 0x40)
> +             panic("%s: OUT OF BOUNDS: register 0x%x\n", __func__, r);
> +     if (r != LAPIC_SPURIOUS && r != LAPIC_TIMER_DIVIDE)
> +             printd("%s: Reading from register 0x%llx\n",
> +                    __func__, r);
> +     if (r == LAPIC_IPI_ICR_UPPER)
> +             val = (read_msr(X2APICBase + LAPIC_IPI_ICR_LOWER) >> 32);

I think we can get rid of this special-casing of ICR_UPPER, since no one
should be asking for it now that we're using the x2APIC.

> +     else
> +             val = read_msr(X2APICBase + r);
>       printd("apicrget: %s returns %p\n", apicregnames[r], val);
> +     if (r == LAPIC_ID) {
> +             printd("APIC ID: 0x%lx\n", val);
> +             printd("APIC LOGICAL ID: 0x%lx\n",
> +                    apicrget(LAPIC_LOGICAL_ID));
> +     }
>       return val;
>  }

You can probably remove the debugging stuff from apicrget and apicrput
now (and anywhere else around here).  

>  void apicinit(int apicno, uintptr_t pa, int isbp)

> +     //IOMMU INIT
> +     init_iommu();

This won't compile at this commit.  I think you add this function in a
later commit.

Also, this is a bad place for this function to be called.  apicinit()
gets called once per LAPIC.  I guess we're waiting for ACPI to find a
better place for this.  Somewhere in arch_init() might work in the
meantime.

> diff --git a/kern/arch/x86/ros/mmu64.h b/kern/arch/x86/ros/mmu64.h
> index 84a494159abb..e8865702fd19 100644
> --- a/kern/arch/x86/ros/mmu64.h
> +++ b/kern/arch/x86/ros/mmu64.h
> @@ -152,8 +152,7 @@ typedef struct x86_pgdir {
>  #define KERN_LOAD_ADDR  0xffffffffc0000000
>  /* Static kernel mappings */
>  #define APIC_SIZE            0x100000
> -#define LAPIC_BASE           (KERN_LOAD_ADDR - APIC_SIZE)
> -#define IOAPIC_BASE          (LAPIC_BASE - APIC_SIZE)
> +#define IOAPIC_BASE          (KERN_LOAD_ADDR - (2 * APIC_SIZE))

We don't need to keep the IOAPIC_BASE at 2 * APIC_SIZE.  We were
subtracting APIC_SIZE just because that was how much room we needed for
each mapping.  So:

-#define IOAPIC_BASE            (KERN_LOAD_ADDR - (2 * APIC_SIZE))
+#define IOAPIC_BASE            (KERN_LOAD_ADDR - APIC_SIZE)

> --- a/kern/arch/x86/trap.c
> +++ b/kern/arch/x86/trap.c
> @@ -498,6 +498,13 @@ static bool vector_is_irq(int apic_vec)
>   * are all mapped up at PIC1_OFFSET for the cpu / irq_handler. */
>  void handle_irq(struct hw_trapframe *hw_tf)
>  {
> +     //TEMPORARY HACK TO EOI THE I_POKE_CORE IRQ
> +     if (hw_tf->tf_trapno == I_POKE_CORE) {
> +             // Send eoi
> +             apicrput(LAPIC_EOI, 0);
> +             return;
> +     }

Let's just fix the POKE_HANDLER in assembly.  I can help with this.

> --- a/kern/arch/x86/trapentry64.S
> +++ b/kern/arch/x86/trapentry64.S
> @@ -321,7 +321,8 @@ IRQ_HANDLER(IRQ202, 234)
>  IRQ_HANDLER(IRQ203, 235)
>  IRQ_HANDLER(IRQ204, 236)
>  IRQ_HANDLER(IRQ205, I_TESTING)
> -POKE_HANDLER(IRQ206, I_POKE_CORE)
> +//POKE_HANDLER(IRQ206, I_POKE_CORE)
> +IRQ_HANDLER(IRQ206, I_POKE_CORE)

Same with here.

> From a89af018760271e22ef6b56ee1284462055d9d31 Mon Sep 17 00:00:00 2001
> From: GanShun <[email protected]>
> Date: Thu, 17 Dec 2015 10:17:49 -0800
> Subject: Added IOMMU initalization code and init_irte functionality

> +void init_iommu(void)
> +{
> +     uint64_t irtar, capability, ext_capability;
> +     uint64_t *irtarp;
> +     uint32_t version, status;
> +     int init = 0;
> +     int i;
> +
> +     if (rootp == NULL)

Should init_iommu() only ever be called once?  (probably, if it is
initializing all of the IOMMUS, or else it should take some sort of
parameter to initialize a specific IOMMU).

In either case, we probably aren't calling it multiple times on the same
IOMMU, so the rootp, iretp, and init checks shouldn't matter.  That
would clear up all of these ifs and the associated logic.

> +             rootp = get_cont_pages(0, KMALLOC_WAIT);

Is this the root of the IOMMU page table?

> +
> +     if (irtep == NULL) {
> +             irtep = get_cont_pages(8, KMALLOC_WAIT);

Is there a way to find out how big of an IRTE we need?  If you're aiming
for the max size, I think that's 128 bytes * 64K entries = 2^23 bytes =
2^11 pages.  (order 11 alloc).

I guess the limit is the number of interrupt sources behind the IOMMU?
(no need to answer that now btw, but that 8 should probably be an 11 to
go with the value below:)

> +             init = 1;
> +     }
> +     if (irtep != NULL && init == 1 && rootp != NULL) {
> +             irtar = (uint64_t)PADDR(irtep) & ~0xfff;
> +             irtar |= 0x80f;

This is saying there are 2^16 entries (the 0xf).

> +void init_irte(uint16_t irte_index, uint32_t dest_id, uint8_t vector,
> +               uint8_t delivery_mode) {
> +     if (irtep[irte_index*2] & PRESENT)

Throughout this function, you multiply the index by 2.  Is that because
irtep* is a u64*?  irtep* should be a struct irtep *, not a u64*.  Then
you don't have to do any of that multiplication, and you can just access
->low and ->high instead of irtep[index * 2 + 1].

> +             panic("TRIED TO ALLOCATE ALREADY PRESENT IRTE");
> +
> +     if (delivery_mode > DELIVERY_MODE_LAST || delivery_mode == 3 ||
> +         delivery_mode == 6) {
> +             panic("INVALID DELIVERY MODE!");
> +     }
> +
> +     irtep[irte_index*2+1] = 0;

Minor thing, but please put spaces around mathematical operators (here
and in other places). (if you keep these at all).

e.g.
-       irtep[irte_index*2+1] = 0;
+       irtep[irte_index * 2 + 1] = 0;

> +     irtep[irte_index*2] = 0;
> +
> +     irtep[irte_index*2] = PRESENT |
> +                           //TRIGGER_MODE |
> +                           //DESTINATION_MODE |
> +                           //REDIRECTION_HINT |
> +                           delivery_mode << 5 |
> +                           (uint64_t)vector << 16 |
> +                           (uint64_t)dest_id << 32;
> +
> +     // Hardcoded value of 0xf0ff for testing until Dan's ACPI code works
> +     irtep[irte_index*2+1] = SOURCE_ID(0xf0ff) |

I guess 0xf0ff is the BDF from a device on your testing machine?  We'll
need to change the API to this function (later, I guess) to take the
TBDF or something for the specific device we're remapping for.

> +                             SOURCE_ID_QUAL_ALL |
> +                             SOURCE_VALIDATION_ON;
> +
> +}
> +
> +void print_fault_regs(void)

I haven't had a chance to read through the section on translation faults
yet, so I don't know too much about this.

> +{
> +     int fault_status;
> +     int fault_index;
> +     uint64_t fault_reg_low, fault_reg_high;
> +
> +     fault_status = get_iommu_reg32(FAULT_STATUS_REG_OFFSET);
> +     if (fault_status & 0x0002) {
> +             fault_index = (fault_status >> 8) & 0xFF;
> +             printk("Fault Status Register: 0x%lx\n", fault_status);
> +
> +             fault_reg_low = get_iommu_reg64(fault_offset +
> +                                             (16 * fault_index));
> +             fault_reg_high = get_iommu_reg64(fault_offset + 8 +
> +                                              (16 * fault_index));
> +             while (fault_reg_high >> 63) {
> +                     printk("Fault Register at index 0x%lx Low: 0x%llx\n",
> +                            fault_index, fault_reg_low);
> +                     printk("Fault Register at index 0x%lx High: 0x%llx\n",
> +                            fault_index, fault_reg_high);
> +
> +                     // Clear the fault by writing back the 1.
> +                     set_iommu_reg64(fault_offset + 8 + (16 * fault_index),
> +                                     fault_reg_high);
> +
> +                     fault_index++;
> +                     fault_reg_low = get_iommu_reg64(fault_offset +
> +                                                     (16 * fault_index));
> +                     fault_reg_high = get_iommu_reg64(fault_offset + 8 +
> +                                                      (16 * fault_index));
> +
> +             }
> +     }
> +}
> +
> +inline uint64_t get_iommu_reg64(uint64_t reg)
> +{
> +     return *((uint64_t *)(DMAR_REG_ADDR + reg));

It's not really the DMAR_REG_ADDR is it?  It's a particular IOMMU's
register base address, which we find from the DMAR.  Since we may have
multiple IOMMUs, do we want these getter/setter functions to take a
struct iommu * or something?

> +}
> +
> +inline void set_iommu_reg64(uint64_t reg, uint64_t value)
> +{
> +     *((uint64_t *)(DMAR_REG_ADDR + reg)) = value;
> +}
> +
> +inline uint32_t get_iommu_reg32(uint64_t reg)
> +{
> +     return *((uint32_t *)(DMAR_REG_ADDR + reg));
> +}
> +
> +inline void set_iommu_reg32(uint64_t reg, uint32_t value)
> +{
> +     *((uint32_t *)(DMAR_REG_ADDR + reg)) = value;
> +}

For all of these helpers, they are just MMIO operations, so they can just
call {read,write}_mmreg{32,64}() under the hood.  read_mmreg64()
doesn't exist yet, but you can add it in another commit.

The nice thing about those helpers is that they also make sure to cast
to volatiles.  e.g.:

        static inline void write_mmreg32(uintptr_t reg, uint32_t val)
        {
            *((volatile uint32_t*)reg) = val;
        }

> +inline uint32_t get_gsr(void)
> +{
> +     return *((uint32_t *)(DMAR_REG_ADDR + GLOBAL_STATUS_REG_OFFSET));
> +}

Don't forget an MMIO helper here (o/w you're missing a volatile*).  This
volatile* will also help you in that while loop in set_gcr (o/w the
compiler might optimize it out).

> +
> +void set_gcr(uint32_t value)
> +{
> +     uint32_t status;
> +
> +     // Check reserved bits.
> +     if (value & 0x7FFFFF)
> +             panic("Setting reserved bits in Global Command Register");
> +
> +     status = get_gsr() | value;
> +     *((uint32_t *)(DMAR_REG_ADDR + GLOBAL_COMMAND_REG_OFFSET)) = status;

Don't forget an MMIO helper here (o/w missing a volatile*).

> +
> +     // Wait till the status bit reflects the change.
> +     do {
> +             status = get_gsr();
> +     } while (status != (status | value));
> +}

As we mentioned offline, if we have an IOMMU, are we guaranteed that
this op will eventually complete?  If so, we can have checks somewhere
else that prevent us getting here.  o/w, my current qemu locks up here.

> diff --git a/kern/arch/x86/iommu.h b/kern/arch/x86/iommu.h

> +#define PRESENT                              (1 << 0)
> +#define FAULT_PROCESSING_DISABLE     (1 << 1)
> +#define DESTINATION_MODE             (1 << 2)
> +#define REDIRECTION_HINT             (1 << 3)
> +#define TRIGGER_MODE                 (1 << 4)
> +
> +/* Delivery mode is 3 bits. */
> +#define DELIVERY_MODE_FIXED          0
> +#define DELIVERY_MODE_LOWEST_PRIORITY        1
> +#define DELIVERY_MODE_SMI            2
> +#define DELIVERY_MODE_NMI            4
> +#define DELIVERY_MODE_INIT           5
> +#define DELIVERY_MODE_EXT_INT                7
> +#define DELIVERY_MODE_LAST           DELIVERY_MODE_EXT_INT

We're going to run into namespacing some with many of the #defines in
this file.  iommu.h will probably get widely included (transitive
includes), so having something generic like #define PRESENT might be a
problem.

Can you prefix some of the #defines in this file with something like
IOMMU_?  (where appropriate).  Things like the delivery mode probably
doesn't need them (esp since they aren't specific to the IOMMU, but are
more of an APIC/IPI thing).

> +
> +/* Available area is 4 bits */
> +#define AVAILABLE                    (0 << 8)
> +
> +#define IRTE_MODE                    (1 << 15)
> +#define IRTE_VECTOR(x)                       ((x) << 16)

Also, watch your formatting / alignment within this file.  In an editor,
the right column is all over the place.


> +/* Temporary define for testing. We should dynamically get the right addr */
> +
> +//#define DMAR_REG_PADDR                     0xfbffe000
> +#define DMAR_REG_ADDR                        0xffffffff80000000

Once we get them dynamically, we'll also not want these to be
#defines.  Maybe put that in an iommu struct.  That'd be a good place
for the globals rootp and irtep too.

> +inline void set_iommu_reg64(uint64_t reg, uint64_t value);
> +
> +inline uint32_t get_iommu_reg32(uint64_t reg);
> +
> +inline void set_iommu_reg32(uint64_t reg, uint32_t value);
> +
> +inline uint32_t get_gsr(void);

Our usual style is to have static inlines with the code in the header
file.  I'm pretty sure putting an inline like this in a header (and with
the body of the function in a C file) does nothing.

> diff --git a/kern/arch/x86/pmap64.c b/kern/arch/x86/pmap64.c
> index 0008f6ee2aed..847dbdd1295f 100644
> --- a/kern/arch/x86/pmap64.c
> +++ b/kern/arch/x86/pmap64.c
> @@ -19,6 +19,7 @@
>  #include <arch/arch.h>
>  #include <arch/mmu.h>
>  #include <arch/apic.h>
> +#include <arch/iommu.h>
>  #include <error.h>
>  #include <sys/queue.h>
>  #include <atomic.h>
> @@ -466,6 +467,8 @@ void vm_init(void)
>       }
>       /* For the LAPIC and IOAPIC, we use PAT (but not *the* PAT flag) to make
>        * these type UC */
> +     map_segment(boot_pgdir, DMAR_REG_ADDR, PGSIZE, DMAR_REG_PADDR,
> +                 PTE_PCD | PTE_PWT | PTE_KERN_RW | PTE_G, max_jumbo_shift);

What is the intent here?  Does the DMAR table need to be mapped at a
fixed location, and does it need to have the same PTE settings as the
old LAPIC/IOAPIC?

If a static mapping is not needed, then we'd be better off using a
dynamic interface:

uintptr_t vmap_pmem(uintptr_t paddr, size_t nr_bytes)

        or if you want caching disabled for accesses (probably):

uintptr_t vmap_pmem_nocache(uintptr_t paddr, size_t nr_bytes)

Also, if you just want a kernel virtual address for a physical address,
then you can just call KADDR(paddr).  (or KADDR_NOCHECK if it's a weird
paddr).

> From b54f47a1605cf659393018ee757eb8bc95da1736 Mon Sep 17 00:00:00 2001
> From: GanShun <[email protected]>
> Date: Thu, 17 Dec 2015 13:39:13 -0800
> Subject: Modified IOAPIC to emit interrupts in remappable format and init irte

> diff --git a/kern/arch/x86/ioapic.c b/kern/arch/x86/ioapic.c

> @@ -532,6 +537,7 @@ int bus_irq_setup(struct irq_handler *irq_h)
>       struct Rbus *rbus;
>       struct Rdt *rdt;
>       int busno, devno, vecno;
> +     uint32_t high, low;
>       struct pci_device *pcidev;
>  
>       if (!ioapic_exists()) {
> @@ -645,7 +651,13 @@ int bus_irq_setup(struct irq_handler *irq_h)
>       rdt->enabled++;
>       rdt->hi = 0;                    /* route to 0 by default */

Since you set rdt->hi below, you can remove this line.  But the comment
(about routing to 0) should stay, since that's still being set.

>       rdt->lo |= Pm | MTf;
> +     rdt->hi = irq_h->dev_irq << (49-32) | 1 << (48-32);
> +
> +     init_irte(irq_h->dev_irq, 0, vecno, DELIVERY_MODE_FIXED);
> +
>       rtblput(rdt->apic, rdt->intin, rdt->hi, rdt->lo);
> +     rtblget(rdt->apic, rdt->intin, &high, &low);
> +     print_fault_regs();
>       vecno = rdt->lo & 0xff;
>       spin_unlock(&rdt->apic->lock);
>  

Thanks,

Barret

-- 
You received this message because you are subscribed to the Google Groups 
"Akaros" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [akaros] Code Review - IOMMU and X2APIC support

Reply via email to