On Tue, Apr 21, 2009 at 6:26 PM, Korey Sewell <[email protected]> wrote:
> > That's not what I mean. What I'm saying is, simulate the timing of a >> TLB stage, but do the functional access with the memory stage. I.e. >> split it for timing purposes, but leave it together for functional >> reasons. I'd be surprised if this does not work since the timing of >> TLB accesses at that granularity shouldn't have much of an impact on >> the program. I think Steve agreed with me on this one. (right Steve?) > > Yea, I think we are misunderstanding each other here. I guess I'm not > exactly sure what you are getting at or what point you are arguing for > (since the conversation got restarted again I may be lost in translation) > > For your point about TLB accesses and timing, I thought that we had > resolved that the timing of TLBs does have a impact on the program which is > why Gabe went through making a "translateTiming" access for the TLB and then > also making SE mode use a TLB. > The timing of a TLB miss definitely has an impact. For all the ISAs we've done prior to x86, TLB misses were handled in software; the translate() method either was a hit or just signaled a miss to be handled later, so we didn't need to do anything special to model their latency. For x86, TLB misses are handled in hardware, and translate() could encapsulate a HW-serviced TLB miss and page table walk. That's why we needed to add translateTiming(). The timing of TLB hits doesn't matter as much; it's a small fixed delay like integer ALU accesses or L1 cache hits, and that latency is pretty much designed into the pipeline, so as long as your pipeline design accounts for that latency properly you don't have to model it very explicitly. Nate's point is that the spot in the pipeline where you account for the latency of a TLB hit doesn't need to be exactly the same spot where you functionally do the TLB access. For hits, I don't think there's any loss of accuracy at all. Likewise for misses on ISAs with SW TLB miss handling, since the TLB miss handler won't get invoked until the instruction commits anyway. The only case where there might be a slight inaccuracy is for TLB misses on x86, which might get kicked off a cycle or two later than they should, but relative to the overall cost of a TLB miss this is very much in the noise (and certainly swamped by other inaccuracies we don't even know about). > Also, I figure that if there are situations where you dont want to use the > TLB then it makes sense to not continously access the TLB object. > I'd be willing to bet that there are no longer any interesting platforms anywhere that don't use TLBs at all. Really low-end systems may play some tricks with a small number of fixed large pages to eliminate most TLB misses, but anything that wants to provide security among multiple processes needs virtual memory of some form, and even relatively cheap cell phones still let you download java games, so I bet they're using their TLBs. > And then lastly, in situations where you have a a situation of a # of > dependent memory accesses waiting, it might be better for them to translate > early if there is going to possibly be a time associated with a TLB miss/hit > (situation that gets exacerbated with more threads on 1 CPU I would > assume)... > This seems like a pretty unlikely design, but even if you did want to model this, you could still just model the timing effects of the early TLB access and defer the functional TLB access until later, with the same impact on timing accuracy I mentioned above: I believe only x86 TLB misses would see any inaccuracy at all, and even then it would be minor. > So thats why currently I have instructions being having to request the TLB > and the Cache as separate entities and which forced me to add "getMemFlags" > and "memAccSize" to the Instruction. That implementation I have now works > well but potentially there's a better non-intrusive solution to the > instruction object. > > Or it sounds like people want to just X out that functionality and always > force a memory access to be tied to a TLB access on that same cycle. > It's not that we *want* the TLB access to be tied to the memory access, just that what you have now is a significant departure from the way the StaticInst objects currently interact with the CPU model, and what we're arguing is that these new functions and this additional complication seem unnecessary. If it was the case that it seemed truly important to be able to separate the functional TLB access from the functional memory access, or that there was a clean way to do that consistent with the way things work currently, then I don't think we'd be objecting. Steve
_______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
