On 04/24/12 08:42, Steve Reinhardt wrote: > On Tue, Apr 24, 2012 at 1:29 AM, Gabe Black <[email protected]> wrote: > >> On 04/23/12 14:36, Steve Reinhardt wrote: >>> Great, sounds like we're pretty much thinking along the same lines. >>> >>> On Mon, Apr 23, 2012 at 11:47 AM, Gabe Black <[email protected]> >> wrote: >>>> We'll have to see how it performs, I guess. One nice thing is that the >>>> page cache indirectly finds the same instruction in different places. >>> You mean the decoder cache, not the page cache, right? >> Yeah, I think so. Conceptually they're two different things, but their >> implementations are intertwined so I don't really think of them as >> separate. >> > It's easy for me to think of them as separate because the state-based > decoder cache has been there forever (and I may have written it myself), > while the page-based cache was grafted in front of it later and I never > really got familiar with that code. > > In any case, we should think of them separately at the design level, since > odds are good we want to treat them differently. > > >>> Also, if a significant fraction of hits are absorbed by the page cache >>> (which I expect is true), then I don't see a benefit from having separate >>> state-based caches vs. just having a single cache that has the full >> context >>> as part of the key. Ideally the hash function should do a good job of >>> dealing with the contextual state, and if lookups aren't extremely >> frequent >>> then the overhead of calculating the hash on the larger state shouldn't >> be >>> a big deal. >> There are two reasons to not make context part of the key. First, you >> have to keep copying it into your ExtMachInst over and over and over >> even though it's always the same thing. Second, you have to have a more >> complex hash function and/or one that does more work to look up only >> keys that match the context when you're excluding the same possible >> matches over and over and over. If you make the context implicit by what >> cache you pick or the fact that nothing is in there except things that >> are compatible, you can just ignore the context. That makes things >> easier every time you do a lookup which is a lot. >> > That's why I said "if a significant fraction of hits are absorbed by the > page cache"... if you were doing this on every instruction, the issues you > bring up would matter, but I'm not convinced they matter if you only do it > on a page cache miss.
They matter because you have to use context in the page cache too. > > >>>>>> What I'm planning to do is to keep track of how many >>>>>> and what bytes where at a particular PC with whatever contextualizing >>>>>> state like operand size, operating mode, etc. When an instruction is >>>>>> being fed into the predecoder, it will just check to see if the first >> n >>>>>> bytes are the same, and if so skip all the way to the static inst. If >>>>>> they aren't or if the contextualizing state changed and the cache was >>>>>> thrown out, then it falls back to the existing mechanism. >>>>>> >>>>> This is just a minor extension to the current decode page cache, right? >>>> Conceptually minor, but I'm still working out a way to tease the decode >>>> cache apart enough that it can be adjusted like that without making a >>>> mess. I haven't spent a lot of time on it yet though, so it may just >>>> take a little more thought. >>>> >>> OK, sounds good. Dealing with context in the page cache seems like a >> more >>> interesting problem. For example, I could see having multiple page >> caches >>> indexed by context and swapping them in and out on context changes to >> avoid >>> having to check the context on every access. >> Yeah, this is the sort of thing I'm thinking of for the pre-predecoder >> cache too. >> > Hmm, we may be talking past each other... when I say "page cache", I mean > the pre-predecoder cache. > > To summarize, right now what I am envisioning is: > > page cache --> predecoder --> state-based cache --> decoder > > Are you thinking of something different? Yes. Context always matters, not just once you get past the predecoder. > > >>> In general our memory overhead is pretty low, so my inclination would be >> to >>> just keep all the decoded instructions around. I'm guessing that >> whatever >>> context you think is slowly or rarely changing, there's probably some >>> pathological case where it changes faster than you think it should. In >>> addition, if we have a state-based cache that just uses the full >>> context+machine instruction as an index, as I proposed above, there's >> never >>> a need to flush it. Depending on how the page cache is handled, you may >>> want to limit what you keep there, but even in that case I'd be biased >>> toward keeping everything just for simplicity's sake. >> My concern is a sparsely populated array, for instance. I haven't looked >> at the actual numbers, but if we have, say, 100,000 possible context >> sets, then we'd have maybe 99,995 unused caches/cache pointers/whatever. >> Maybe a context indexed hash as a backing store for caches would >> eliminate this problem. Actually I think that would probably work out >> pretty well. >> > Yea, I agree, if the context is sparse then another hash_map is called for. > > Steve > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
