Re: [m5-dev] X86 performance

Gabriel Michael Black Mon, 25 Oct 2010 16:17:41 -0700

Quoting Steve Reinhardt <[email protected]>:

On Fri, Oct 22, 2010 at 6:06 PM, Steve Reinhardt <[email protected]> wrote:

On Fri, Oct 22, 2010 at 3:59 PM, Steve Reinhardt <[email protected]> wrote:

I'd still really encourage you to work on cutting out the middleman
and find a way to go straight from raw bytes to StaticInsts via a
cache.


Just to be clear: what I mean is that we need a way to do the "tag
check" on the PC-indexed decoded page cache using raw bytes, so we can
determine hits there without invoking the predecoder.  If the decode
page cache misses and we have to repopulate it, then how we manage the
"backing" decode cache is probably not that big of an issue, and
probably does require going through the predecoder since otherwise we
won't know necessarily how long the undecoded instruction byte
sequence is.

In fact the main reason we need a cache there at all is so that we can
re-use the same StaticInst in multiple places; I'm not sure it really
saves that much time relative to doing the full decode.  (Probably
some, but I don't know how much.)


Just to reiterate this message: Nate's radix tree may or may not be a
good idea for directly looking up StaticInsts based on raw byte
sequences, but that's not what I was talking about; sorry for being
unclear.

I think what we really need is to replace or augment the ExtMachInst
that's currently stored in each StaticInst with the raw machine
instruction plus context info.  Then when we get a StaticInst from the
PC-based decode page cache, we can validate it by comparing the raw
machine instruction with the byte(s) we fetch and the current context,
without invoking the predecoder.  Once we get the predecoder out of
the path of decode page cache hits, I'm guessing the performance of
the predecoder itself won't matter so much anymore.

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Yeah, I think I get what you're saying. If we're gathering up anExtMachInst at some point (and we will still have to, pretty much nomatter what) then we might as well stick it in the StaticInst. It canbe handy for debugging if nothing else.

Also I wanted to mention that while Nate's radix tree might not fitwith x86 universally, if there were per mode (pre)decoders with permode caches then it would work better. Then you'd use the cache thatfit with whatever the circumstances were which eliminates theambiguity. That would have the potential to really pump up the memoryoverhead for the cache since there could be a lot of redundancy, butit still might not be that bad since that wouldn't need to scale withthe simulation, just the diversity of instructions being executed.

On a semi-tangent, I'm mulling over the benefits of turning the ISAobject into the new ISA namespace. The ISA namespace would stillexist, but it would be for the actual implementation behind thescenes. All the bits that would be exposed to the outside world (ie.everything not the ISA) would be brought into the class definitionwith typedefs, using directives, etc. That would make it a littleclearer what was there just because it's handy for that particularISA, and what functions are part of the established interface to theISA that every ISA needs to implement. It would also allow templatingclasses on the ISA which we can't do with the current namespace. Theability to define things in more than one file would still bepreserved because things would be defined as part of the namespacelike they are now and just brought into the ISA object to export.

The semi of semi-tangent comes in because the decode functionalitywould then be local to the ISA state so it could be more easilyswitched in and out, virtual or not virtual, as appropriate. It wouldalso allow the predecoder and the decoder to coordinate with eachother to share StaticInst cache information.

I'm sure there are downsides to all this, one obvious one being thatwe'd loose some of the current isolation between different types ofISA header files. These tend to include each other pretty freely,though, so it might not be that different.

I haven't even really decided if I like this idea for sure, but itsounded interesting enough where I thought I'd mention it.


Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 performance

Reply via email to