Re: [gem5-dev] decoder changes I'm working on

Gabe Black Tue, 24 Apr 2012 01:29:10 -0700

On 04/23/12 14:36, Steve Reinhardt wrote:
> Great, sounds like we're pretty much thinking along the same lines.
>
> On Mon, Apr 23, 2012 at 11:47 AM, Gabe Black <[email protected]> wrote:
>
>> We'll have to see how it performs, I guess. One nice thing is that the
>> page cache indirectly finds the same instruction in different places.
>
> You mean the decoder cache, not the page cache, right?


Yeah, I think so. Conceptually they're two different things, but their
implementations are intertwined so I don't really think of them as separate.


>
>
>> That saves memory because we don't have two copies of it floating
>> around, and it increases our hit rate a bit I'm sure. The question is by
>> how much. I'm guessing the memory savings is more than the increase in
>> hit rate percentage wise.
>>
> I had forgotten about this aspect, but it's a pretty important one.
>  There's really no reason at all that there should be two identical
> StaticInst instances representing the same machine instruction (with
> context) in the system.  I think that you get this whether you have a cache
> (or set of caches) that operate on the raw machine code + context or on the
> "ExtMachInst" (or whatever we call the output of the predecoder... I'll
> keep calling it ExtMachInst for convenience).  I believe the key thing is
> just that you need a purely state-based cache backing up the page-based
> cache.
>
> Off the top of my head, the only justification for having this state-based
> cache behind the predecoder would be if there are multiple binary encodings
> for the same identical instruction and you wanted them all to share the
> same StaticInst.  I believe this could happen with x86 (where some prefixes
> can be effectively no-ops), but probably not with other ISAs (at least not
> commonly). Even if you did this, you'd have to make sure that those no-op
> prefixes were filtered out of the ExtMachInst, and then you'd have to make
> sure you could deal with N:1 mappings from binary encodings to StaticInsts,
> which could be a minor pain.  So even in this case I don't see where having
> the cache behind the predecoder is a win.
>
> Of course, you still need to combine the machine instruction with the
> context before doing the cache lookup.  When all the predecoder does is
> trivially perform this combining (as in Alpha) then it works out that
> having the cache after the predecoder is basically the same as having it
> before.
>
> Also, if a significant fraction of hits are absorbed by the page cache
> (which I expect is true), then I don't see a benefit from having separate
> state-based caches vs. just having a single cache that has the full context
> as part of the key.  Ideally the hash function should do a good job of
> dealing with the contextual state, and if lookups aren't extremely frequent
> then the overhead of calculating the hash on the larger state shouldn't be
> a big deal.

There are two reasons to not make context part of the key. First, you
have to keep copying it into your ExtMachInst over and over and over
even though it's always the same thing. Second, you have to have a more
complex hash function and/or one that does more work to look up only
keys that match the context when you're excluding the same possible
matches over and over and over. If you make the context implicit by what
cache you pick or the fact that nothing is in there except things that
are compatible, you can just ignore the context. That makes things
easier every time you do a lookup which is a lot.


>
>
>>>> What I'm planning to do is to keep track of how many
>>>> and what bytes where at a particular PC with whatever contextualizing
>>>> state like operand size, operating mode, etc. When an instruction is
>>>> being fed into the predecoder, it will just check to see if the first n
>>>> bytes are the same, and if so skip all the way to the static inst. If
>>>> they aren't or if the contextualizing state changed and the cache was
>>>> thrown out, then it falls back to the existing mechanism.
>>>>
>>> This is just a minor extension to the current decode page cache, right?
>> Conceptually minor, but I'm still working out a way to tease the decode
>> cache apart enough that it can be adjusted like that without making a
>> mess. I haven't spent a lot of time on it yet though, so it may just
>> take a little more thought.
>>
> OK, sounds good.  Dealing with context in the page cache seems like a more
> interesting problem.  For example, I could see having multiple page caches
> indexed by context and swapping them in and out on context changes to avoid
> having to check the context on every access.

Yeah, this is the sort of thing I'm thinking of for the pre-predecoder
cache too.


>
>
>>> Also, can you clarify what you mean by "the cache was thrown out"?  Seems
>>> like you might want to switch to a different cache, but clearing the
>> cache
>>> on every context state change might not be a good idea.  I was thinking
>> you
>>> were planning to have a separate decode cache for each contextual state
>>> setting.  Actually the new design should allow each ISA to choose
>>> independently what parts of the decode context it wants to use as part of
>>> the cache tag/key and what parts it wants to use to select independent
>>> caches (or even when it just wants to flush the caches), right?
>>>
>>> Probably we should be clear when we're talking about the page cache and
>>> when we're talking about the decode cache... you may want different
>>> policies on the two of them.  I'm still not convinced that flushing on
>>> context changes will be a good idea for either one though.  Figuring out
>>> how to optimally manage context for the page cache could be a little
>> tricky
>>> though.
>> This would be for slow changing context which we would predecipher to
>> avoid having to look at it over and over again. These would be things
>> that change on, say, process changes or switching in and out of the
>> kernel. If those change, it may make sense to just chuck the cache since
>> there may be too many combinations to keep explicit storage for. Maybe
>> we could cache our caches and keep the last 3 or 4 or something. I'd
>> imagine there aren't very many combinations that actually get used for a
>> given simulation.
>>
> In general our memory overhead is pretty low, so my inclination would be to
> just keep all the decoded instructions around.  I'm guessing that whatever
> context you think is slowly or rarely changing, there's probably some
> pathological case where it changes faster than you think it should.  In
> addition, if we have a state-based cache that just uses the full
> context+machine instruction as an index, as I proposed above, there's never
> a need to flush it.  Depending on how the page cache is handled, you may
> want to limit what you keep there, but even in that case I'd be biased
> toward keeping everything just for simplicity's sake.

My concern is a sparsely populated array, for instance. I haven't looked
at the actual numbers, but if we have, say, 100,000 possible context
sets, then we'd have maybe 99,995 unused caches/cache pointers/whatever.
Maybe a context indexed hash as a backing store for caches would
eliminate this problem. Actually I think that would probably work out
pretty well.

>
>
>>> Seems like you really want an opaque decode object where all the caching
>> is
>>> completely hidden inside the object... so the ISA can internally decide
>> to
>>> have decode caches associated 1:1 with decode functions, but wouldn't
>>> necessarily have to do it that way.
>> Yeah, I want to give flexibility to the ISAs to figure out what makes
>> sense for them. I don't want it to be too loosely coupled, though,
>> because then there's a virtual function call or something like that, and
>> it gets more cumbersome to set up since there are more pieces. As I
>> said, though, I'm not super happy with how what I've got is working out.
>> Once I have it more together I'll post some patches for comment.
>
> Nice, I'll look forward to that.
>
> Steve
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] decoder changes I'm working on

Reply via email to