Re: [ORLinux] Hardware assisted tlb reload in mor1kx

Stefan Kristiansson Mon, 29 Jul 2013 04:44:51 -0700

On Mon, Jul 29, 2013 at 10:19:37AM +0200, Jonas Bonn wrote:
> > As a rough estimate (by looking at simulation waveforms and comparing
> > the time spent in the tlb miss exception handler and the time spent when
> > doing a hw reload), the hardware tlb reload should be about 7.5 times faster
> > than the software reload.
> > The hardware reload isn't completely optimized, so you could still shave off
> > a couple of cycles there.
> > Perhaps that is true for the tlb miss handler in Linux too, so the rough
> > estimate is probably a good enough indicator at what kind of speedup we
> > can estimate from this.
> 
> Walking the page table is pretty much the same operation whether it be
> in software or hardware... the savings are the context switch
> associated with the exception handler.
>


+ the overhead of doing shift and mask operations.

> >
> > First, it doesn't exactly follow the arch specifications definition of the
> > pagetable entries (pte), instead it uses the pte layout that our Linux port
> > defines.
> >
> > Let me illustrate the differences.
> > or1k arch spec pte layout:
> > | 31 ... 10 | 9 | 8 ... 6 | 5 | 4 | 3 | 2 | 1 | 0 |
> > |    PPN    | L |PP INDEX | D | A |WOM|WBC|CI |CC |
> >
> 
> We have 8K pages... why so many bits for the PPN?  Bits 31, 30, and 29
> are never used?
> 

My guess is, another overlook in the arch spec. The page size was probably
not decided when they defined the PTE.
Bits 12, 11, and 10 are not used.

> 
> > Linux pte layout:
> > | 31 ... 12 |  11  | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 |   0   |
> > |    PPN    |SHARED|EXEC|SWE|SRE|UWE|URE| D | A |WOM|WBC|CI |PRESENT|
> >

So, I was sloppy here, the PPN is actually bits 31 - 13 and bit 12 is not used.

> > The biggest difference is that the arch spec defines a seperate register
> > (xMMUPR) which holds a table of protection bits, and the PP INDEX field
> > of the pte is used to pick out the "right" protection flags from that.
> > In our Linux port on the other hand, it has been chosen to not follow
> > this and embed the protection bits straight into the pte (which of
> > course is perfectly fine as it was designed for software tlb reload).
> > So, the question is, should we change Linux to be compliant with the
> > arch specs definition of the ptes and start using a PP index field or
> > change the arch spec to allow usages of the Linux definition?
> 
> What are the protection combinations that are actually used:
> 
> SR|SW|SX /* really? */
> SR|SW
> SR|SX
> SR
> 
> /* We may want to drop the SX's here */
> SR|SW|SX|UR|UW|UX
> SR|SW|SX|UR|UX
> SR|SW|SX|UR|UW
> SR|SW|SX|UR
> 
> Is that all?  If yes, then SR is always set, and UR is always set for
> user pages.
> 

I'm not sure, but it doesn't sound too far fetched.

> So we have:
> 
> USERPAGE?
> WRITABLE?
> EXECUTABLE?
> 
> ...that's 3 bits, which maps nicely into the 3 bits available for PP
> INDEX.  So the hardware and software implementations aren't
> contradictory there.
> 

Yes, that sounds good.
And we could setup a static mapping in the PPI registers for that,
and leave it up to implementations to actually implement the PPI
registers or use the same static mapping.

> L ("link") isn't intresenting for the software implementation, so
> reusing that for SHARED is fine there, but the HW implementation wants
> L... where does SHARED go in that case and, furthermore, what is it
> even used for?  Somebody please check what that SHARED flag is doing.
> 

We can do without the L, if we just assume a two-level page table,
but it sounds better to actually do it properly and set L ("last" not "link")
on the actual PTE.

I'm not sure about the SHARED, looking around, many architectures seems to set
it to a combination of PRESENT, USER, RW and ACCESSED.
I'll continue investigating this.

> Finally, we play games with the CC bit since we don't have an SMP
> Implementation of OpenRISC.  It's used to indicate that a page is
> swapped out.  And the WBC bit is aliased to distinguish page cache via
> the PAGE_FILE flag.  For the HW implementation we can't do this, so
> where do we put these?  Bits 31 and 30 and have the HW mapper mask
> them out?
> 

So, under the assumption all of the above turns out ok (and the SHARED
flag has to have it's own bit), our Linux PTE could look something like this.

| 31 ... 13 | 12 |  11  |   10  | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|    PPN    |FILE|SHARED|PRESENT| L | X | W | U | D | A |WOM|WBC|CI |CC |

> > arch/openrisc/include/asm/spr_defs.h:
> > The defines for the bitfields of xMMUCR are wrong in all of our spr_defs.h,
> > I tried to dig into where those defines come from, but both the arch spec
> > and spr_defs.h have been different since the beginning of time (or as long
> > back as the commit histories date back, some time in year 2000).
> >
> 
> I think that file has even more errors that that.  Didn't somebody fix
> this file up in or1ksim but not sync the kernel version?
> 

I think they are pretty much up to sync, but not sure.
There might be more errors in it, but at least this was in all versions I
looked at.

> > arch/openrisc/mm/init.c:
> > arch/openrisc/mm/tlb.c:
> > The correct value of the pagetable base pointer is updated to the xMMUCR
> > registers right after paging is initially set up and on each switch_mm.
> >
> > arch/openrisc/mm/fault.c:
> > do_pagefault is called a bit differently when it is called from the 
> > pagefault
> > exception vectors and when it is called from the tlb miss exception vectors.
> > I've put in a hack there to make that difference disappear, but this has
> > to be addressed properly and as I see it there are two ways.
> 
> >
> > 1) Do the necessary checks in do_pagefault to see if it should handle a
> >    protection fault, or a missing page fault.
> 
> I think this is the right approach.
> 

I think so too, so let's do that. I'll take a look at what needs to be done.

> >         if (address >= VMALLOC_START &&
> > -           (vector != 0x300 && vector != 0x400) &&
> > +           /*(vector != 0x300 && vector != 0x400) &&*/
> >             !user_mode(regs))
> >                 goto vmalloc_fault;
> 
> This won't work as things stand today...
> 

Certainly not, but it served as a good example of the problem I wanted to show.

Stefan
_______________________________________________
Linux mailing list
Linux@lists.openrisc.net
http://lists.openrisc.net/listinfo/linux

Re: [ORLinux] Hardware assisted tlb reload in mor1kx

Reply via email to