Re: [m5-dev] [PATCH 03 of 10] Expose memory access size and flags through instruction object

Steve Reinhardt Tue, 21 Apr 2009 21:47:07 -0700

On Tue, Apr 21, 2009 at 6:26 PM, Korey Sewell <[email protected]> wrote:


>
> That's not what I mean.  What I'm saying is, simulate the timing of a
>> TLB stage, but do the functional access with the memory stage.  I.e.
>> split it for timing purposes, but leave it together for functional
>> reasons.  I'd be surprised if this does not work since the timing of
>> TLB accesses at that granularity shouldn't have much of an impact on
>> the program.  I think Steve agreed with me on this one. (right Steve?)
>
> Yea, I think we are misunderstanding each other here. I guess I'm not
> exactly sure what you are getting at or what point you are arguing for
> (since the conversation got restarted again I may be lost in translation)
>
> For your point about TLB accesses and timing, I thought that we had
> resolved that the timing of TLBs does have a impact on the program which is
> why Gabe went through making a "translateTiming" access for the TLB and then
> also making SE mode use a TLB.
>

The timing of a TLB miss definitely has an impact.  For all the ISAs we've
done prior to x86, TLB misses were handled in software; the translate()
method either was a hit or just signaled a miss to be handled later, so we
didn't need to do anything special to model their latency.  For x86, TLB
misses are handled in hardware, and translate() could encapsulate a
HW-serviced TLB miss and page table walk.  That's why we needed to add
translateTiming().

The timing of TLB hits doesn't matter as much; it's a small fixed delay like
integer ALU accesses or L1 cache hits, and that latency is pretty much
designed into the pipeline, so as long as your pipeline design accounts for
that latency properly you don't have to model it very explicitly.

Nate's point is that the spot in the pipeline where you account for the
latency of a TLB hit doesn't need to be exactly the same spot where you
functionally do the TLB access.  For hits, I don't think there's any loss of
accuracy at all.  Likewise for misses on ISAs with SW TLB miss handling,
since the TLB miss handler won't get invoked until the instruction commits
anyway.  The only case where there might be a slight inaccuracy is for TLB
misses on x86, which might get kicked off a cycle or two later than they
should, but relative to the overall cost of a TLB miss this is very much in
the noise (and certainly swamped by other inaccuracies we don't even know
about).


> Also, I figure that if there are situations where you dont want to use the
> TLB then it makes sense to not continously access the TLB object.
>

I'd be willing to bet that there are no longer any interesting platforms
anywhere that don't use TLBs at all.  Really low-end systems may play some
tricks with a small number of fixed large pages to eliminate most TLB
misses, but anything that wants to provide security among multiple processes
needs virtual memory of some form, and even relatively cheap cell phones
still let you download java games, so I bet they're using their TLBs.


> And then lastly, in situations where you have a a situation of a # of
> dependent memory accesses waiting, it might be better for them to translate
> early if there is going to possibly be a time associated with a TLB miss/hit
> (situation that gets exacerbated with more threads on 1 CPU I would
> assume)...
>

This seems like a pretty unlikely design, but even if you did want to model
this, you could still just model the timing effects of the early TLB access
and defer the functional TLB access until later, with the same impact on
timing accuracy I mentioned above: I believe only x86 TLB misses would see
any inaccuracy at all, and even then it would be minor.


> So thats why currently I have instructions being having to request the TLB
> and the Cache as separate entities and which forced me to add "getMemFlags"
> and "memAccSize" to the Instruction. That implementation I have now works
> well but potentially there's a better non-intrusive solution to the
> instruction object.
>
> Or it sounds like people want to just X out that functionality and always
> force a memory access to be tied to a TLB access on that same cycle.
>

It's not that we *want* the TLB access to be tied to the memory access, just
that what you have now is a significant departure from the way the
StaticInst objects currently interact with the CPU model, and what we're
arguing is that these new functions and this additional complication seem
unnecessary.  If it was the case that it seemed truly important to be able
to separate the functional TLB access from the functional memory access, or
that there was a clean way to do that consistent with the way things work
currently, then I don't think we'd be objecting.

Steve

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] [PATCH 03 of 10] Expose memory access size and flags through instruction object

Reply via email to