Hi folks. I'm working on some decoder changes which are along lines I've laid out before, and I thought it would be a good idea to described what those are for, what they do, and how things are going so far.
The main component of what I'm doing is to turn the decoder from a bare function to an object with state. That allows it to keep track of whether it's in full system or syscall emulation mode locally, if it should be in, say, 64 bit mode on x86 or thumb mode on ARM, and manage its instruction cache intelligently itself. Because the decoder is then an object like the predecoder and one essentially just pipes into the other, I'm also consolidating them into a single object. That should make the CPUs' lives easier, and it opens up opportunities for the decoding process for a particular ISA to be smarter since it's more in control of the process and more of the inner workings are kept inside the decoder itself. Merging the predecoder and decoder will be especially important for x86 because it should allow moving the decode cache/adding a new cache in front of the predecoder. X86's predecoder is a lot more complex than other ISAs, and it always runs for every instruction because it's output determines whether or not there's a hit in the decode cache. The function which compares ExtMachInsts is even fairly complex since it compares an expanded, canonicalized instruction instead of just the bytes it came from. What I'm planning to do is to keep track of how many and what bytes where at a particular PC with whatever contextualizing state like operand size, operating mode, etc. When an instruction is being fed into the predecoder, it will just check to see if the first n bytes are the same, and if so skip all the way to the static inst. If they aren't or if the contextualizing state changed and the cache was thrown out, then it falls back to the existing mechanism. So far I've made decoder objects for all the ISAs, made the parser generate a member function for them, made full system a decoder local variable where appropriate, and merged the predecoder and decoder for x86 and in the CPUs. Actually I made the parser generate a decode function which belongs to the decode cache itself. That way, the cache can call into its decode function without intervention, and there's always a one to one mapping between decode caches and decode functions. The implementation isn't too bad, but it's a bit more convoluted than I'd like. To avoid lots of duplicate code I've resorted to some templating stuff that I don't really like either. Gabe _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
