Hi folks. I'm working on some decoder changes which are along lines I've
laid out before, and I thought it would be a good idea to described what
those are for, what they do, and how things are going so far.

The main component of what I'm doing is to turn the decoder from a bare
function to an object with state. That allows it to keep track of
whether it's in full system or syscall emulation mode locally, if it
should be in, say, 64 bit mode on x86 or thumb mode on ARM, and manage
its instruction cache intelligently itself. Because the decoder is then
an object like the predecoder and one essentially just pipes into the
other, I'm also consolidating them into a single object. That should
make the CPUs' lives easier, and it opens up opportunities for the
decoding process for a particular ISA to be smarter since it's more in
control of the process and more of the inner workings are kept inside
the decoder itself.

Merging the predecoder and decoder will be especially important for x86
because it should allow moving the decode cache/adding a new cache in
front of the predecoder. X86's predecoder is a lot more complex than
other ISAs, and it always runs for every instruction because it's output
determines whether or not there's a hit in the decode cache. The
function which compares ExtMachInsts is even fairly complex since it
compares an expanded, canonicalized instruction instead of just the
bytes it came from. What I'm planning to do is to keep track of how many
and what bytes where at a particular PC with whatever contextualizing
state like operand size, operating mode, etc. When an instruction is
being fed into the predecoder, it will just check to see if the first n
bytes are the same, and if so skip all the way to the static inst. If
they aren't or if the contextualizing state changed and the cache was
thrown out, then it falls back to the existing mechanism.

So far I've made decoder objects for all the ISAs, made the parser
generate a member function for them, made full system a decoder local
variable where appropriate, and merged the predecoder and decoder for
x86 and in the CPUs. Actually I made the parser generate a decode
function which belongs to the decode cache itself. That way, the cache
can call into its decode function without intervention, and there's
always a one to one mapping between decode caches and decode functions.
The implementation isn't too bad, but it's a bit more convoluted than
I'd like. To avoid lots of duplicate code I've resorted to some
templating stuff that I don't really like either.

Gabe
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to