This sounds pretty good to me. I have a few clarifying questions inline below.
On Sun, Apr 22, 2012 at 11:58 PM, Gabe Black <[email protected]> wrote: > Hi folks. I'm working on some decoder changes which are along lines I've > laid out before, and I thought it would be a good idea to described what > those are for, what they do, and how things are going so far. > > The main component of what I'm doing is to turn the decoder from a bare > function to an object with state. That allows it to keep track of > whether it's in full system or syscall emulation mode locally, if it > should be in, say, 64 bit mode on x86 or thumb mode on ARM, and manage > its instruction cache intelligently itself. Because the decoder is then > an object like the predecoder and one essentially just pipes into the > other, I'm also consolidating them into a single object. That should > make the CPUs' lives easier, and it opens up opportunities for the > decoding process for a particular ISA to be smarter since it's more in > control of the process and more of the inner workings are kept inside > the decoder itself. > Would the ExtMachInst class then no longer be exported outside of the ISA description? That would be nice, IMO. It was an OK hack when we were just ORing in another bit, but with what x86 is doing now, it would be nice if the CPU models could be oblivious to it. Merging the predecoder and decoder will be especially important for x86 > because it should allow moving the decode cache/adding a new cache in > front of the predecoder. X86's predecoder is a lot more complex than > other ISAs, and it always runs for every instruction because it's output > determines whether or not there's a hit in the decode cache. The > function which compares ExtMachInsts is even fairly complex since it > compares an expanded, canonicalized instruction instead of just the > bytes it came from. Great! Strangely just this morning I was remembering how I wanted us to get to the point where the decode cache could work on raw bytes and skip the predecoder. My guess is that if we can put a good enough cache in front of the predecoder then having a cache between the predecoder and the decoder would become pointless (i.e., I'd favor moving the cache rather than adding a new one). > What I'm planning to do is to keep track of how many > and what bytes where at a particular PC with whatever contextualizing > state like operand size, operating mode, etc. When an instruction is > being fed into the predecoder, it will just check to see if the first n > bytes are the same, and if so skip all the way to the static inst. If > they aren't or if the contextualizing state changed and the cache was > thrown out, then it falls back to the existing mechanism. > This is just a minor extension to the current decode page cache, right? Also, can you clarify what you mean by "the cache was thrown out"? Seems like you might want to switch to a different cache, but clearing the cache on every context state change might not be a good idea. I was thinking you were planning to have a separate decode cache for each contextual state setting. Actually the new design should allow each ISA to choose independently what parts of the decode context it wants to use as part of the cache tag/key and what parts it wants to use to select independent caches (or even when it just wants to flush the caches), right? Probably we should be clear when we're talking about the page cache and when we're talking about the decode cache... you may want different policies on the two of them. I'm still not convinced that flushing on context changes will be a good idea for either one though. Figuring out how to optimally manage context for the page cache could be a little tricky though. So far I've made decoder objects for all the ISAs, made the parser > generate a member function for them, made full system a decoder local > variable where appropriate, and merged the predecoder and decoder for > x86 and in the CPUs. Actually I made the parser generate a decode > function which belongs to the decode cache itself. That way, the cache > can call into its decode function without intervention, and there's > always a one to one mapping between decode caches and decode functions. > The implementation isn't too bad, but it's a bit more convoluted than > I'd like. To avoid lots of duplicate code I've resorted to some > templating stuff that I don't really like either. > Seems like you really want an opaque decode object where all the caching is completely hidden inside the object... so the ISA can internally decide to have decode caches associated 1:1 with decode functions, but wouldn't necessarily have to do it that way. Steve _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
