This sounds pretty good to me.  I have a few clarifying questions inline
below.


On Sun, Apr 22, 2012 at 11:58 PM, Gabe Black <[email protected]> wrote:

> Hi folks. I'm working on some decoder changes which are along lines I've
> laid out before, and I thought it would be a good idea to described what
> those are for, what they do, and how things are going so far.
>
> The main component of what I'm doing is to turn the decoder from a bare
> function to an object with state. That allows it to keep track of
> whether it's in full system or syscall emulation mode locally, if it
> should be in, say, 64 bit mode on x86 or thumb mode on ARM, and manage
> its instruction cache intelligently itself. Because the decoder is then
> an object like the predecoder and one essentially just pipes into the
> other, I'm also consolidating them into a single object. That should
> make the CPUs' lives easier, and it opens up opportunities for the
> decoding process for a particular ISA to be smarter since it's more in
> control of the process and more of the inner workings are kept inside
> the decoder itself.
>

Would the ExtMachInst class then no longer be exported outside of the ISA
description?  That would be nice, IMO.  It was an OK hack when we were just
ORing in another bit, but with what x86 is doing now, it would be nice if
the CPU models could be oblivious to it.

Merging the predecoder and decoder will be especially important for x86
> because it should allow moving the decode cache/adding a new cache in
> front of the predecoder. X86's predecoder is a lot more complex than
> other ISAs, and it always runs for every instruction because it's output
> determines whether or not there's a hit in the decode cache. The
> function which compares ExtMachInsts is even fairly complex since it
> compares an expanded, canonicalized instruction instead of just the
> bytes it came from.


Great!  Strangely just this morning I was remembering how I wanted us to
get to the point where the decode cache could work on raw bytes and skip
the predecoder.

My guess is that if we can put a good enough cache in front of the
predecoder then having a cache between the predecoder and the decoder would
become pointless (i.e., I'd favor moving the cache rather than adding a new
one).


> What I'm planning to do is to keep track of how many
> and what bytes where at a particular PC with whatever contextualizing
> state like operand size, operating mode, etc. When an instruction is
> being fed into the predecoder, it will just check to see if the first n
> bytes are the same, and if so skip all the way to the static inst. If
> they aren't or if the contextualizing state changed and the cache was
> thrown out, then it falls back to the existing mechanism.
>

This is just a minor extension to the current decode page cache, right?

Also, can you clarify what you mean by "the cache was thrown out"?  Seems
like you might want to switch to a different cache, but clearing the cache
on every context state change might not be a good idea.  I was thinking you
were planning to have a separate decode cache for each contextual state
setting.  Actually the new design should allow each ISA to choose
independently what parts of the decode context it wants to use as part of
the cache tag/key and what parts it wants to use to select independent
caches (or even when it just wants to flush the caches), right?

Probably we should be clear when we're talking about the page cache and
when we're talking about the decode cache... you may want different
policies on the two of them.  I'm still not convinced that flushing on
context changes will be a good idea for either one though.  Figuring out
how to optimally manage context for the page cache could be a little tricky
though.


So far I've made decoder objects for all the ISAs, made the parser
> generate a member function for them, made full system a decoder local
> variable where appropriate, and merged the predecoder and decoder for
> x86 and in the CPUs. Actually I made the parser generate a decode
> function which belongs to the decode cache itself. That way, the cache
> can call into its decode function without intervention, and there's
> always a one to one mapping between decode caches and decode functions.
> The implementation isn't too bad, but it's a bit more convoluted than
> I'd like. To avoid lots of duplicate code I've resorted to some
> templating stuff that I don't really like either.
>

Seems like you really want an opaque decode object where all the caching is
completely hidden inside the object... so the ISA can internally decide to
have decode caches associated 1:1 with decode functions, but wouldn't
necessarily have to do it that way.

Steve
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to