Re: [ORLinux] Hardware assisted tlb reload in mor1kx

Jonas Bonn Mon, 29 Jul 2013 01:20:12 -0700

Hi Stefan,

On 28 July 2013 06:02, Stefan Kristiansson
<stefan.kristians...@saunalahti.fi> wrote:
> Good news everybody!
>
> We've got some hardware tlb reload going on in the hottest OpenRISC 1000
> implementation there is, no more wasting instructions on tlb miss exceptions
> when running Linux.


Grand!

>
> As a rough estimate (by looking at simulation waveforms and comparing
> the time spent in the tlb miss exception handler and the time spent when
> doing a hw reload), the hardware tlb reload should be about 7.5 times faster
> than the software reload.
> The hardware reload isn't completely optimized, so you could still shave off
> a couple of cycles there.
> Perhaps that is true for the tlb miss handler in Linux too, so the rough
> estimate is probably a good enough indicator at what kind of speedup we
> can estimate from this.

Walking the page table is pretty much the same operation whether it be
in software or hardware... the savings are the context switch
associated with the exception handler.

>
> Another rough estimate of how much time is spent in the tlb miss vectors
> was done by running 'gcc hello_world.c -o hello_world' in the jor1k
> emulator (http://s-macke.github.io/jor1k/) and by using the stats from that
> we saw that (momentarily) roughly up to 25% of the time was spent in the
> dtlb miss exception handler.
> This could of course also be improved by increasing the number of sets and 
> ways
> used in the mmus, but that's another topic that might be addressed in the
> future.
>

Process start-up is always going to be dominated by a flood of
TLB-misses in order to populate the initially clean TLB.  Similarly
after a context switch where the TLB has been flushed.  There are ways
to mitigate this, with trade-offs:  address space indexes to allow the
TLB to be "shared" between contexts (and thus not flushed);
speculatively preloading the TLB but at the cost of possibly flushing
out other entries that might be needed shortly; et cetera.

> As always, you can find it in the github repos at:
> https://github.com/openrisc/mor1kx
>
> But before we bring out the champagne and start celebrating, some notes about
> the implementation that needs some discussion.
>
> First, it doesn't exactly follow the arch specifications definition of the
> pagetable entries (pte), instead it uses the pte layout that our Linux port
> defines.
>
> Let me illustrate the differences.
> or1k arch spec pte layout:
> | 31 ... 10 | 9 | 8 ... 6 | 5 | 4 | 3 | 2 | 1 | 0 |
> |    PPN    | L |PP INDEX | D | A |WOM|WBC|CI |CC |
>

We have 8K pages... why so many bits for the PPN?  Bits 31, 30, and 29
are never used?


> Linux pte layout:
> | 31 ... 12 |  11  | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |   0   |
> |    PPN    |SHARED|EXEC|SWE|SRE|UWE|URE| D | A |WOM|WBC|CI |PRESENT|
>
> The biggest difference is that the arch spec defines a seperate register
> (xMMUPR) which holds a table of protection bits, and the PP INDEX field
> of the pte is used to pick out the "right" protection flags from that.
> In our Linux port on the other hand, it has been chosen to not follow
> this and embed the protection bits straight into the pte (which of
> course is perfectly fine as it was designed for software tlb reload).
> So, the question is, should we change Linux to be compliant with the
> arch specs definition of the ptes and start using a PP index field or
> change the arch spec to allow usages of the Linux definition?

What are the protection combinations that are actually used:

SR|SW|SX /* really? */
SR|SW
SR|SX
SR

/* We may want to drop the SX's here */
SR|SW|SX|UR|UW|UX
SR|SW|SX|UR|UX
SR|SW|SX|UR|UW
SR|SW|SX|UR

Is that all?  If yes, then SR is always set, and UR is always set for
user pages.

So we have:

USERPAGE?
WRITABLE?
EXECUTABLE?

...that's 3 bits, which maps nicely into the 3 bits available for PP
INDEX.  So the hardware and software implementations aren't
contradictory there.

L ("link") isn't intresenting for the software implementation, so
reusing that for SHARED is fine there, but the HW implementation wants
L... where does SHARED go in that case and, furthermore, what is it
even used for?  Somebody please check what that SHARED flag is doing.

Finally, we play games with the CC bit since we don't have an SMP
Implementation of OpenRISC.  It's used to indicate that a page is
swapped out.  And the WBC bit is aliased to distinguish page cache via
the PAGE_FILE flag.  For the HW implementation we can't do this, so
where do we put these?  Bits 31 and 30 and have the HW mapper mask
them out?

>
> Second, naturally there are a couple of changes needed to Linux for this to
> work.
> The changes are minor but needs commenting before proper patches are sent out.
> The full diff is available in the end of this mail, but I'll first comment the
> changes to each file.
>
> arch/openrisc/include/asm/spr_defs.h:
> The defines for the bitfields of xMMUCR are wrong in all of our spr_defs.h,
> I tried to dig into where those defines come from, but both the arch spec
> and spr_defs.h have been different since the beginning of time (or as long
> back as the commit histories date back, some time in year 2000).
>

I think that file has even more errors that that.  Didn't somebody fix
this file up in or1ksim but not sync the kernel version?

> arch/openrisc/kernel/head.S:
> The implementation in mor1kx works so, that if the xMMUCR register is 0,
> it will generate tlb miss exceptions, so we have to make sure that it
> is zero when the MMUs are enabled, so the boot tlb miss handlers are used
> until paging is set up.

I think that's a sound solution... requires a minor documentation
change to the arch spec.  We might be able to do even better though
and set up the PTE's early so that the boot handlers aren't needed at
all.

>
> arch/openrisc/mm/init.c:
> arch/openrisc/mm/tlb.c:
> The correct value of the pagetable base pointer is updated to the xMMUCR
> registers right after paging is initially set up and on each switch_mm.
>
> arch/openrisc/mm/fault.c:
> do_pagefault is called a bit differently when it is called from the pagefault
> exception vectors and when it is called from the tlb miss exception vectors.
> I've put in a hack there to make that difference disappear, but this has
> to be addressed properly and as I see it there are two ways.

>
> 1) Do the necessary checks in do_pagefault to see if it should handle a
>    protection fault, or a missing page fault.

I think this is the right approach.

> 2) Make mor1kx generate a tlb miss exception instead of a pagefault when the
>    pte table pointer is zero or the PRESENT bit is not set.
>
> Some thoughts and comments on those issues, please!
>
> Stefan
>
> --- >8 ---
> diff --git a/arch/openrisc/include/asm/spr_defs.h 
> b/arch/openrisc/include/asm/spr_defs.h
> index 5dbc668..1d20915 100644
> --- a/arch/openrisc/include/asm/spr_defs.h
> +++ b/arch/openrisc/include/asm/spr_defs.h
> @@ -226,19 +226,15 @@
>   * Bit definitions for the Data MMU Control Register
>   *
>   */
> -#define SPR_DMMUCR_P2S    0x0000003e  /* Level 2 Page Size */
> -#define SPR_DMMUCR_P1S    0x000007c0  /* Level 1 Page Size */
> -#define SPR_DMMUCR_VADDR_WIDTH 0x0000f800  /* Virtual ADDR Width */
> -#define SPR_DMMUCR_PADDR_WIDTH 0x000f0000  /* Physical ADDR Width */
> +#define SPR_DMMUCR_PTBP           0xfffffc00  /* Page Table Base Pointer */
> +#define SPR_DMMUCR_DTF    0x00000001  /* DTLB Flush */
>
>  /*
>   * Bit definitions for the Instruction MMU Control Register
>   *
>   */
> -#define SPR_IMMUCR_P2S    0x0000003e  /* Level 2 Page Size */
> -#define SPR_IMMUCR_P1S    0x000007c0  /* Level 1 Page Size */
> -#define SPR_IMMUCR_VADDR_WIDTH 0x0000f800  /* Virtual ADDR Width */
> -#define SPR_IMMUCR_PADDR_WIDTH 0x000f0000  /* Physical ADDR Width */
> +#define SPR_IMMUCR_PTBP           0xfffffc00  /* Page Table Base Pointer */
> +#define SPR_IMMUCR_ITF    0x00000001  /* ITLB Flush */
>
>  /*
>   * Bit definitions for the Data TLB Match Register
> diff --git a/arch/openrisc/kernel/head.S b/arch/openrisc/kernel/head.S
> index 1d3c9c2..59a3263 100644
> --- a/arch/openrisc/kernel/head.S
> +++ b/arch/openrisc/kernel/head.S
> @@ -541,6 +541,15 @@ flush_tlb:
>
>  enable_mmu:
>         /*
> +        * Make sure the page table base pointer is cleared
> +        * ( = hardware tlb fill disabled)
> +        */
> +       l.movhi r30,0
> +       l.mtspr r0,r30,SPR_DMMUCR
> +       l.movhi r30,0
> +       l.mtspr r0,r30,SPR_IMMUCR
> +
> +       /*
>          * enable dmmu & immu
>          * SR[5] = 0, SR[6] = 0, 6th and 7th bit of SR set to 0
>          */
> diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
> index e2bfafc..4c07a20 100644
> --- a/arch/openrisc/mm/fault.c
> +++ b/arch/openrisc/mm/fault.c
> @@ -78,7 +78,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, 
> unsigned long address,
>          */
>
>         if (address >= VMALLOC_START &&
> -           (vector != 0x300 && vector != 0x400) &&
> +           /*(vector != 0x300 && vector != 0x400) &&*/
>             !user_mode(regs))
>                 goto vmalloc_fault;

This won't work as things stand today...

>
> diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
> index e7fdc50..d8b8068 100644
> --- a/arch/openrisc/mm/init.c
> +++ b/arch/openrisc/mm/init.c
> @@ -191,6 +191,14 @@ void __init paging_init(void)
>         mtspr(SPR_ICBIR, 0x900);
>         mtspr(SPR_ICBIR, 0xa00);
>
> +       /*
> +        * Update the pagetable base pointer, to enable hardware tlb refill if
> +        * supported by the hardware
> +        */
> +       mtspr(SPR_IMMUCR, __pa(current_pgd) & SPR_IMMUCR_PTBP);
> +       mtspr(SPR_DMMUCR, __pa(current_pgd) & SPR_DMMUCR_PTBP);
> +
> +
>         /* New TLB miss handlers and kernel page tables are in now place.
>          * Make sure that page flags get updated for all pages in TLB by
>          * flushing the TLB and forcing all TLB entries to be recreated
> diff --git a/arch/openrisc/mm/tlb.c b/arch/openrisc/mm/tlb.c
> index 683bd4d..96e6df3 100644
> --- a/arch/openrisc/mm/tlb.c
> +++ b/arch/openrisc/mm/tlb.c
> @@ -151,6 +151,14 @@ void switch_mm(struct mm_struct *prev, struct mm_struct 
> *next,
>          */
>         current_pgd = next->pgd;
>
> +       /*
> +        * Update the pagetable base pointer with the new pgd.
> +        * This only have effect on implementations with hardware tlb refill
> +        * support.
> +        */
> +       mtspr(SPR_IMMUCR, __pa(current_pgd) & SPR_IMMUCR_PTBP);
> +       mtspr(SPR_DMMUCR, __pa(current_pgd) & SPR_DMMUCR_PTBP);
> +
>         /* We don't have context support implemented, so flush all
>          * entries belonging to previous map
>          */
> --- >8 ---
> _______________________________________________
> Linux mailing list
> Linux@lists.openrisc.net
> http://lists.openrisc.net/listinfo/linux

/Jonas
_______________________________________________
Linux mailing list
Linux@lists.openrisc.net
http://lists.openrisc.net/listinfo/linux

Re: [ORLinux] Hardware assisted tlb reload in mor1kx

Reply via email to