Steve Reinhardt wrote:
> I haven't looked at the code... how is the x86 page-table walk handled
> currently? Is it done in microcode or do we have a "hardware" state
> machine for it?  It seems to me that in the long run we want an
> autonomous HW page-table walker, and that the idea of a "TLB miss
> fault" for x86 should go away.

There's a "hardware" state machine that's attached to the TLB. I agree
that the TLB miss fault should go away.

>
> One way to change the CPU/memory interface that might not be too
> disruptive to the CPU models and would also mirror a real HW
> implementation more closely (always a good sign, IMO) would be simply
> to push translation to the other side of the decoupled callback
> interface.  In Gabe's model, this would be (for timing mode):
>
> 1. Instruction generates request.
> 2. CPU asks TLB/cache to translate and satisfy request.  There is no
> immediate feedback.
> 3. Get coffee while request is handled.
> 4. The request comes back, possibly indicating a fault. If there's a
> fault, handle it; if not, finish the instruction.
>
> Then all of the translate/page-table walk/skip cache on page fault
> etc. stuff happens concurrently in the memory system in step 3.
>
> I haven't thought through what this implies in detail... is the TLB
> now a first-class memory-system object with a Port interface that sits
> between the CPU and the cache?  If that's too much overhead, is there
> a better way to do it?

I like this idea and think we should head in that direction. There are
two possible concerns with doing things this way though. First, the CPU
won't be able to get at the physical address of a request as easily as
before, so it might not be able to, for instance, do load-store
forwarding as effectively. That's speculation on my part since I'm not
sure how it's done in o3 or in a real CPU. As a matter of fact that may
be based around virtual addresses anyway to cut the TLB lookup out of
the critical path. Second, things could get more complicated as far as
virtually/physically tagged/indexed and back probing and whatnot, and
also dealing with coherence. I'm not familiar enough with the details of
those systems to be able to predict what the complications might be.

Those complications aside, though, I think moving the TLB out of the CPU
and into the memory system is a good thing.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to