Great, this will definitely speed up things.

I would suggest to enable this hardware tlb refill by bit 17 in the
supervision register SR and not by a zero or nonzero  DMMUCR or IMMUCR
register. Then it would be more consistent with the specification to
control such a feature. 


Am 7/27/2013 9:02 PM, schrieb Stefan Kristiansson:
> Good news everybody!
>
> We've got some hardware tlb reload going on in the hottest OpenRISC 1000
> implementation there is, no more wasting instructions on tlb miss exceptions
> when running Linux.
>
> As a rough estimate (by looking at simulation waveforms and comparing
> the time spent in the tlb miss exception handler and the time spent when
> doing a hw reload), the hardware tlb reload should be about 7.5 times faster
> than the software reload.
> The hardware reload isn't completely optimized, so you could still shave off
> a couple of cycles there.
> Perhaps that is true for the tlb miss handler in Linux too, so the rough
> estimate is probably a good enough indicator at what kind of speedup we
> can estimate from this.
>
> Another rough estimate of how much time is spent in the tlb miss vectors
> was done by running 'gcc hello_world.c -o hello_world' in the jor1k
> emulator (http://s-macke.github.io/jor1k/) and by using the stats from that
> we saw that (momentarily) roughly up to 25% of the time was spent in the
> dtlb miss exception handler.
> This could of course also be improved by increasing the number of sets and 
> ways
> used in the mmus, but that's another topic that might be addressed in the
> future.
>
> As always, you can find it in the github repos at:
> https://github.com/openrisc/mor1kx
>
> But before we bring out the champagne and start celebrating, some notes about
> the implementation that needs some discussion.
>
> First, it doesn't exactly follow the arch specifications definition of the
> pagetable entries (pte), instead it uses the pte layout that our Linux port
> defines.
>
> Let me illustrate the differences.
> or1k arch spec pte layout:
> | 31 ... 10 | 9 | 8 ... 6 | 5 | 4 | 3 | 2 | 1 | 0 |
> |    PPN    | L |PP INDEX | D | A |WOM|WBC|CI |CC |
>
> Linux pte layout:
> | 31 ... 12 |  11  | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |   0   |
> |    PPN    |SHARED|EXEC|SWE|SRE|UWE|URE| D | A |WOM|WBC|CI |PRESENT|
>
> The biggest difference is that the arch spec defines a seperate register
> (xMMUPR) which holds a table of protection bits, and the PP INDEX field
> of the pte is used to pick out the "right" protection flags from that.
> In our Linux port on the other hand, it has been chosen to not follow
> this and embed the protection bits straight into the pte (which of
> course is perfectly fine as it was designed for software tlb reload).
> So, the question is, should we change Linux to be compliant with the
> arch specs definition of the ptes and start using a PP index field or
> change the arch spec to allow usages of the Linux definition?
>
> Second, naturally there are a couple of changes needed to Linux for this to
> work.
> The changes are minor but needs commenting before proper patches are sent out.
> The full diff is available in the end of this mail, but I'll first comment the
> changes to each file.
>
> arch/openrisc/include/asm/spr_defs.h:
> The defines for the bitfields of xMMUCR are wrong in all of our spr_defs.h,
> I tried to dig into where those defines come from, but both the arch spec
> and spr_defs.h have been different since the beginning of time (or as long
> back as the commit histories date back, some time in year 2000).
>
> arch/openrisc/kernel/head.S:
> The implementation in mor1kx works so, that if the xMMUCR register is 0,
> it will generate tlb miss exceptions, so we have to make sure that it
> is zero when the MMUs are enabled, so the boot tlb miss handlers are used
> until paging is set up.
>
> arch/openrisc/mm/init.c:
> arch/openrisc/mm/tlb.c:
> The correct value of the pagetable base pointer is updated to the xMMUCR
> registers right after paging is initially set up and on each switch_mm.
>
> arch/openrisc/mm/fault.c:
> do_pagefault is called a bit differently when it is called from the pagefault
> exception vectors and when it is called from the tlb miss exception vectors.
> I've put in a hack there to make that difference disappear, but this has
> to be addressed properly and as I see it there are two ways.
>
> 1) Do the necessary checks in do_pagefault to see if it should handle a
>    protection fault, or a missing page fault.
> 2) Make mor1kx generate a tlb miss exception instead of a pagefault when the
>    pte table pointer is zero or the PRESENT bit is not set.
>
> Some thoughts and comments on those issues, please!
>
> Stefan
>
> --- >8 ---
> diff --git a/arch/openrisc/include/asm/spr_defs.h 
> b/arch/openrisc/include/asm/spr_defs.h
> index 5dbc668..1d20915 100644
> --- a/arch/openrisc/include/asm/spr_defs.h
> +++ b/arch/openrisc/include/asm/spr_defs.h
> @@ -226,19 +226,15 @@
>   * Bit definitions for the Data MMU Control Register
>   *
>   */
> -#define SPR_DMMUCR_P2S          0x0000003e  /* Level 2 Page Size */
> -#define SPR_DMMUCR_P1S          0x000007c0  /* Level 1 Page Size */
> -#define SPR_DMMUCR_VADDR_WIDTH       0x0000f800  /* Virtual ADDR Width */
> -#define SPR_DMMUCR_PADDR_WIDTH       0x000f0000  /* Physical ADDR Width */
> +#define SPR_DMMUCR_PTBP         0xfffffc00  /* Page Table Base Pointer */
> +#define SPR_DMMUCR_DTF          0x00000001  /* DTLB Flush */
>  
>  /*
>   * Bit definitions for the Instruction MMU Control Register
>   *
>   */
> -#define SPR_IMMUCR_P2S          0x0000003e  /* Level 2 Page Size */
> -#define SPR_IMMUCR_P1S          0x000007c0  /* Level 1 Page Size */
> -#define SPR_IMMUCR_VADDR_WIDTH       0x0000f800  /* Virtual ADDR Width */
> -#define SPR_IMMUCR_PADDR_WIDTH       0x000f0000  /* Physical ADDR Width */
> +#define SPR_IMMUCR_PTBP         0xfffffc00  /* Page Table Base Pointer */
> +#define SPR_IMMUCR_ITF          0x00000001  /* ITLB Flush */
>  
>  /*
>   * Bit definitions for the Data TLB Match Register
> diff --git a/arch/openrisc/kernel/head.S b/arch/openrisc/kernel/head.S
> index 1d3c9c2..59a3263 100644
> --- a/arch/openrisc/kernel/head.S
> +++ b/arch/openrisc/kernel/head.S
> @@ -541,6 +541,15 @@ flush_tlb:
>  
>  enable_mmu:
>       /*
> +      * Make sure the page table base pointer is cleared
> +      * ( = hardware tlb fill disabled)
> +      */
> +     l.movhi r30,0
> +     l.mtspr r0,r30,SPR_DMMUCR
> +     l.movhi r30,0
> +     l.mtspr r0,r30,SPR_IMMUCR
> +
> +     /*
>        * enable dmmu & immu
>        * SR[5] = 0, SR[6] = 0, 6th and 7th bit of SR set to 0
>        */
> diff --git a/arch/openrisc/mm/fault.c b/arch/openrisc/mm/fault.c
> index e2bfafc..4c07a20 100644
> --- a/arch/openrisc/mm/fault.c
> +++ b/arch/openrisc/mm/fault.c
> @@ -78,7 +78,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs, 
> unsigned long address,
>        */
>  
>       if (address >= VMALLOC_START &&
> -         (vector != 0x300 && vector != 0x400) &&
> +         /*(vector != 0x300 && vector != 0x400) &&*/
>           !user_mode(regs))
>               goto vmalloc_fault;
>  
> diff --git a/arch/openrisc/mm/init.c b/arch/openrisc/mm/init.c
> index e7fdc50..d8b8068 100644
> --- a/arch/openrisc/mm/init.c
> +++ b/arch/openrisc/mm/init.c
> @@ -191,6 +191,14 @@ void __init paging_init(void)
>       mtspr(SPR_ICBIR, 0x900);
>       mtspr(SPR_ICBIR, 0xa00);
>  
> +     /*
> +      * Update the pagetable base pointer, to enable hardware tlb refill if
> +      * supported by the hardware
> +      */
> +     mtspr(SPR_IMMUCR, __pa(current_pgd) & SPR_IMMUCR_PTBP);
> +     mtspr(SPR_DMMUCR, __pa(current_pgd) & SPR_DMMUCR_PTBP);
> +
> +
>       /* New TLB miss handlers and kernel page tables are in now place.
>        * Make sure that page flags get updated for all pages in TLB by
>        * flushing the TLB and forcing all TLB entries to be recreated
> diff --git a/arch/openrisc/mm/tlb.c b/arch/openrisc/mm/tlb.c
> index 683bd4d..96e6df3 100644
> --- a/arch/openrisc/mm/tlb.c
> +++ b/arch/openrisc/mm/tlb.c
> @@ -151,6 +151,14 @@ void switch_mm(struct mm_struct *prev, struct mm_struct 
> *next,
>        */
>       current_pgd = next->pgd;
>  
> +     /*
> +      * Update the pagetable base pointer with the new pgd.
> +      * This only have effect on implementations with hardware tlb refill
> +      * support.
> +      */
> +     mtspr(SPR_IMMUCR, __pa(current_pgd) & SPR_IMMUCR_PTBP);
> +     mtspr(SPR_DMMUCR, __pa(current_pgd) & SPR_DMMUCR_PTBP);
> +
>       /* We don't have context support implemented, so flush all
>        * entries belonging to previous map
>        */
> --- >8 ---
> _______________________________________________
> Linux mailing list
> Linux@lists.openrisc.net
> http://lists.openrisc.net/listinfo/linux

_______________________________________________
Linux mailing list
Linux@lists.openrisc.net
http://lists.openrisc.net/listinfo/linux

Reply via email to