On 04/24/12 08:42, Steve Reinhardt wrote:
> On Tue, Apr 24, 2012 at 1:29 AM, Gabe Black <[email protected]> wrote:
>
>> On 04/23/12 14:36, Steve Reinhardt wrote:
>>> Great, sounds like we're pretty much thinking along the same lines.
>>>
>>> On Mon, Apr 23, 2012 at 11:47 AM, Gabe Black <[email protected]>
>> wrote:
>>>> We'll have to see how it performs, I guess. One nice thing is that the
>>>> page cache indirectly finds the same instruction in different places.
>>> You mean the decoder cache, not the page cache, right?
>> Yeah, I think so. Conceptually they're two different things, but their
>> implementations are intertwined so I don't really think of them as
>> separate.
>>
> It's easy for me to think of them as separate because the state-based
> decoder cache has been there forever (and I may have written it myself),
> while the page-based cache was grafted in front of it later and I never
> really got familiar with that code.
>
> In any case, we should think of them separately at the design level, since
> odds are good we want to treat them differently.
>
>
>>> Also, if a significant fraction of hits are absorbed by the page cache
>>> (which I expect is true), then I don't see a benefit from having separate
>>> state-based caches vs. just having a single cache that has the full
>> context
>>> as part of the key.  Ideally the hash function should do a good job of
>>> dealing with the contextual state, and if lookups aren't extremely
>> frequent
>>> then the overhead of calculating the hash on the larger state shouldn't
>> be
>>> a big deal.
>> There are two reasons to not make context part of the key. First, you
>> have to keep copying it into your ExtMachInst over and over and over
>> even though it's always the same thing. Second, you have to have a more
>> complex hash function and/or one that does more work to look up only
>> keys that match the context when you're excluding the same possible
>> matches over and over and over. If you make the context implicit by what
>> cache you pick or the fact that nothing is in there except things that
>> are compatible, you can just ignore the context. That makes things
>> easier every time you do a lookup which is a lot.
>>
> That's why I said "if a significant fraction of hits are absorbed by the
> page cache"... if you were doing this on every instruction, the issues you
> bring up would matter, but I'm not convinced they matter if you only do it
> on a page cache miss.

They matter because you have to use context in the page cache too.

>
>
>>>>>> What I'm planning to do is to keep track of how many
>>>>>> and what bytes where at a particular PC with whatever contextualizing
>>>>>> state like operand size, operating mode, etc. When an instruction is
>>>>>> being fed into the predecoder, it will just check to see if the first
>> n
>>>>>> bytes are the same, and if so skip all the way to the static inst. If
>>>>>> they aren't or if the contextualizing state changed and the cache was
>>>>>> thrown out, then it falls back to the existing mechanism.
>>>>>>
>>>>> This is just a minor extension to the current decode page cache, right?
>>>> Conceptually minor, but I'm still working out a way to tease the decode
>>>> cache apart enough that it can be adjusted like that without making a
>>>> mess. I haven't spent a lot of time on it yet though, so it may just
>>>> take a little more thought.
>>>>
>>> OK, sounds good.  Dealing with context in the page cache seems like a
>> more
>>> interesting problem.  For example, I could see having multiple page
>> caches
>>> indexed by context and swapping them in and out on context changes to
>> avoid
>>> having to check the context on every access.
>> Yeah, this is the sort of thing I'm thinking of for the pre-predecoder
>> cache too.
>>
> Hmm, we may be talking past each other... when I say "page cache", I mean
> the pre-predecoder cache.
>
> To summarize, right now what I am envisioning is:
>
> page cache --> predecoder --> state-based cache --> decoder
>
> Are you thinking of something different?

Yes. Context always matters, not just once you get past the predecoder.

>
>
>>> In general our memory overhead is pretty low, so my inclination would be
>> to
>>> just keep all the decoded instructions around.  I'm guessing that
>> whatever
>>> context you think is slowly or rarely changing, there's probably some
>>> pathological case where it changes faster than you think it should.  In
>>> addition, if we have a state-based cache that just uses the full
>>> context+machine instruction as an index, as I proposed above, there's
>> never
>>> a need to flush it.  Depending on how the page cache is handled, you may
>>> want to limit what you keep there, but even in that case I'd be biased
>>> toward keeping everything just for simplicity's sake.
>> My concern is a sparsely populated array, for instance. I haven't looked
>> at the actual numbers, but if we have, say, 100,000 possible context
>> sets, then we'd have maybe 99,995 unused caches/cache pointers/whatever.
>> Maybe a context indexed hash as a backing store for caches would
>> eliminate this problem. Actually I think that would probably work out
>> pretty well.
>>
> Yea, I agree, if the context is sparse then another hash_map is called for.
>
> Steve
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to