While we figure this out I'll implement something quick and ugly so I 
can shake out bugs further on. Please comment on what you think may be 
the coherence/virtual/physical implications of decoupling the TLB from 
the CPU.

Gabe

Gabe Black wrote:
> Steve Reinhardt wrote:
>   
>> I haven't looked at the code... how is the x86 page-table walk handled
>> currently? Is it done in microcode or do we have a "hardware" state
>> machine for it?  It seems to me that in the long run we want an
>> autonomous HW page-table walker, and that the idea of a "TLB miss
>> fault" for x86 should go away.
>>     
>
> There's a "hardware" state machine that's attached to the TLB. I agree
> that the TLB miss fault should go away.
>
>   
>> One way to change the CPU/memory interface that might not be too
>> disruptive to the CPU models and would also mirror a real HW
>> implementation more closely (always a good sign, IMO) would be simply
>> to push translation to the other side of the decoupled callback
>> interface.  In Gabe's model, this would be (for timing mode):
>>
>> 1. Instruction generates request.
>> 2. CPU asks TLB/cache to translate and satisfy request.  There is no
>> immediate feedback.
>> 3. Get coffee while request is handled.
>> 4. The request comes back, possibly indicating a fault. If there's a
>> fault, handle it; if not, finish the instruction.
>>
>> Then all of the translate/page-table walk/skip cache on page fault
>> etc. stuff happens concurrently in the memory system in step 3.
>>
>> I haven't thought through what this implies in detail... is the TLB
>> now a first-class memory-system object with a Port interface that sits
>> between the CPU and the cache?  If that's too much overhead, is there
>> a better way to do it?
>>     
>
> I like this idea and think we should head in that direction. There are
> two possible concerns with doing things this way though. First, the CPU
> won't be able to get at the physical address of a request as easily as
> before, so it might not be able to, for instance, do load-store
> forwarding as effectively. That's speculation on my part since I'm not
> sure how it's done in o3 or in a real CPU. As a matter of fact that may
> be based around virtual addresses anyway to cut the TLB lookup out of
> the critical path. Second, things could get more complicated as far as
> virtually/physically tagged/indexed and back probing and whatnot, and
> also dealing with coherence. I'm not familiar enough with the details of
> those systems to be able to predict what the complications might be.
>
> Those complications aside, though, I think moving the TLB out of the CPU
> and into the memory system is a good thing.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to