While we figure this out I'll implement something quick and ugly so I can shake out bugs further on. Please comment on what you think may be the coherence/virtual/physical implications of decoupling the TLB from the CPU.
Gabe Gabe Black wrote: > Steve Reinhardt wrote: > >> I haven't looked at the code... how is the x86 page-table walk handled >> currently? Is it done in microcode or do we have a "hardware" state >> machine for it? It seems to me that in the long run we want an >> autonomous HW page-table walker, and that the idea of a "TLB miss >> fault" for x86 should go away. >> > > There's a "hardware" state machine that's attached to the TLB. I agree > that the TLB miss fault should go away. > > >> One way to change the CPU/memory interface that might not be too >> disruptive to the CPU models and would also mirror a real HW >> implementation more closely (always a good sign, IMO) would be simply >> to push translation to the other side of the decoupled callback >> interface. In Gabe's model, this would be (for timing mode): >> >> 1. Instruction generates request. >> 2. CPU asks TLB/cache to translate and satisfy request. There is no >> immediate feedback. >> 3. Get coffee while request is handled. >> 4. The request comes back, possibly indicating a fault. If there's a >> fault, handle it; if not, finish the instruction. >> >> Then all of the translate/page-table walk/skip cache on page fault >> etc. stuff happens concurrently in the memory system in step 3. >> >> I haven't thought through what this implies in detail... is the TLB >> now a first-class memory-system object with a Port interface that sits >> between the CPU and the cache? If that's too much overhead, is there >> a better way to do it? >> > > I like this idea and think we should head in that direction. There are > two possible concerns with doing things this way though. First, the CPU > won't be able to get at the physical address of a request as easily as > before, so it might not be able to, for instance, do load-store > forwarding as effectively. That's speculation on my part since I'm not > sure how it's done in o3 or in a real CPU. As a matter of fact that may > be based around virtual addresses anyway to cut the TLB lookup out of > the critical path. Second, things could get more complicated as far as > virtually/physically tagged/indexed and back probing and whatnot, and > also dealing with coherence. I'm not familiar enough with the details of > those systems to be able to predict what the complications might be. > > Those complications aside, though, I think moving the TLB out of the CPU > and into the memory system is a good thing. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
