On 04/23/12 07:42, Steve Reinhardt wrote: > This sounds pretty good to me. I have a few clarifying questions inline > below. > > > On Sun, Apr 22, 2012 at 11:58 PM, Gabe Black <[email protected]> wrote: > >> Hi folks. I'm working on some decoder changes which are along lines I've >> laid out before, and I thought it would be a good idea to described what >> those are for, what they do, and how things are going so far. >> >> The main component of what I'm doing is to turn the decoder from a bare >> function to an object with state. That allows it to keep track of >> whether it's in full system or syscall emulation mode locally, if it >> should be in, say, 64 bit mode on x86 or thumb mode on ARM, and manage >> its instruction cache intelligently itself. Because the decoder is then >> an object like the predecoder and one essentially just pipes into the >> other, I'm also consolidating them into a single object. That should >> make the CPUs' lives easier, and it opens up opportunities for the >> decoding process for a particular ISA to be smarter since it's more in >> control of the process and more of the inner workings are kept inside >> the decoder itself. >> > Would the ExtMachInst class then no longer be exported outside of the ISA > description? That would be nice, IMO. It was an OK hack when we were just > ORing in another bit, but with what x86 is doing now, it would be nice if > the CPU models could be oblivious to it.
I'm not completely sure if we can, but it would be nice. One problem would be the noop ExtMachInst the ISAs define so the CPUs can send an ISA specific noop down the pipe. Really, though, there's no reason I can think of that we don't just use a StaticInstPtr directly, or even have a globally defined StaticInst that does nothing and avoid the ISA specific aspect all together. > Merging the predecoder and decoder will be especially important for x86 >> because it should allow moving the decode cache/adding a new cache in >> front of the predecoder. X86's predecoder is a lot more complex than >> other ISAs, and it always runs for every instruction because it's output >> determines whether or not there's a hit in the decode cache. The >> function which compares ExtMachInsts is even fairly complex since it >> compares an expanded, canonicalized instruction instead of just the >> bytes it came from. > > Great! Strangely just this morning I was remembering how I wanted us to > get to the point where the decode cache could work on raw bytes and skip > the predecoder. > > My guess is that if we can put a good enough cache in front of the > predecoder then having a cache between the predecoder and the decoder would > become pointless (i.e., I'd favor moving the cache rather than adding a new > one). We'll have to see how it performs, I guess. One nice thing is that the page cache indirectly finds the same instruction in different places. That saves memory because we don't have two copies of it floating around, and it increases our hit rate a bit I'm sure. The question is by how much. I'm guessing the memory savings is more than the increase in hit rate percentage wise. > >> What I'm planning to do is to keep track of how many >> and what bytes where at a particular PC with whatever contextualizing >> state like operand size, operating mode, etc. When an instruction is >> being fed into the predecoder, it will just check to see if the first n >> bytes are the same, and if so skip all the way to the static inst. If >> they aren't or if the contextualizing state changed and the cache was >> thrown out, then it falls back to the existing mechanism. >> > This is just a minor extension to the current decode page cache, right? Conceptually minor, but I'm still working out a way to tease the decode cache apart enough that it can be adjusted like that without making a mess. I haven't spent a lot of time on it yet though, so it may just take a little more thought. > Also, can you clarify what you mean by "the cache was thrown out"? Seems > like you might want to switch to a different cache, but clearing the cache > on every context state change might not be a good idea. I was thinking you > were planning to have a separate decode cache for each contextual state > setting. Actually the new design should allow each ISA to choose > independently what parts of the decode context it wants to use as part of > the cache tag/key and what parts it wants to use to select independent > caches (or even when it just wants to flush the caches), right? > > Probably we should be clear when we're talking about the page cache and > when we're talking about the decode cache... you may want different > policies on the two of them. I'm still not convinced that flushing on > context changes will be a good idea for either one though. Figuring out > how to optimally manage context for the page cache could be a little tricky > though. This would be for slow changing context which we would predecipher to avoid having to look at it over and over again. These would be things that change on, say, process changes or switching in and out of the kernel. If those change, it may make sense to just chuck the cache since there may be too many combinations to keep explicit storage for. Maybe we could cache our caches and keep the last 3 or 4 or something. I'd imagine there aren't very many combinations that actually get used for a given simulation. > > So far I've made decoder objects for all the ISAs, made the parser >> generate a member function for them, made full system a decoder local >> variable where appropriate, and merged the predecoder and decoder for >> x86 and in the CPUs. Actually I made the parser generate a decode >> function which belongs to the decode cache itself. That way, the cache >> can call into its decode function without intervention, and there's >> always a one to one mapping between decode caches and decode functions. >> The implementation isn't too bad, but it's a bit more convoluted than >> I'd like. To avoid lots of duplicate code I've resorted to some >> templating stuff that I don't really like either. >> > Seems like you really want an opaque decode object where all the caching is > completely hidden inside the object... so the ISA can internally decide to > have decode caches associated 1:1 with decode functions, but wouldn't > necessarily have to do it that way. Yeah, I want to give flexibility to the ISAs to figure out what makes sense for them. I don't want it to be too loosely coupled, though, because then there's a virtual function call or something like that, and it gets more cumbersome to set up since there are more pieces. As I said, though, I'm not super happy with how what I've got is working out. Once I have it more together I'll post some patches for comment. Gabe _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
