Re: [m5-dev] X86 performance

Steve Reinhardt Fri, 22 Oct 2010 15:59:59 -0700

On Fri, Oct 22, 2010 at 3:35 PM, Gabriel Michael Black
<[email protected]> wrote:
>> Even for RISC ISAs we should still be grabbing a cache line at a time
>> from the icache/memory system, IMO.  (Do we do this already or not?)
>
> We do in O3 but not in the simple CPUs if I remember correctly. I don't
> remember what InOrder does. There are some complications doing things this
> way with self modifying code. I remember one instance at VMware where
> someone mentioned that the Intel manuals supposedly said that between
> control flow instructions code wouldn't necessarily be checked for
> modification, and a recently ex-AMD engineer was surprised by that. There's
> apparently some ambiguity in how that sort of thing is supposed to work.


Ugh, SMC, bleah, I forgot about that.  But the thing is you only have
to deal with SMC in the case where you have to deal with variable
instructions too, right?  So adding a fetch buffer to a RISC ISA still
seems safe, as long as we are sure to flush it whenever the "icache
flush" PAL call (or whatever) gets executed.

> I have the start of an idea floating around in my head of putting this sort
> of caching mechanism in the predocoder but then making it cooperate with the
> inst cache somehow. Or maybe having an index into the cache based on either
> the ExtMachInst or the byte stream. I think splitting things out would
> really make things a lot bigger and harder to understand. Also having
> multiple scenarios and hence sets or requirements something can run under
> (be that the CPUs or the ISAs) makes development harder. We're seeing that
> with the base update stuff, I think.

I'd still really encourage you to work on cutting out the middleman
and find a way to go straight from raw bytes to StaticInsts via a
cache.  I really don't see any long-term benefit to doing this in two
stages.  I agree that we don't want a proliferation of code paths or
scenarios, so once we find a way to make it work for x86 the next step
is to see how we can take the same approach and streamline it to work
efficiently for the other ISAs, then use that to completely replace
the current ExtMachInst->StaticInst cache.

Maybe we need to rethink ExtMachInsts entirely even...

> For instance, right now the predecoder in x86 computes, for every single
> instruction that passes through it, what the operand size, address size,
> stack size, and mode should be. Some of that information may not change in
> hours of simulation, and at most would change very infrequently, but it gets
> rediscovered over and over and over and over to contextualize ExtMachInsts
> for the regular decoder. It would be a huge performance win, I think,
> considering how often that's called, if the 64bit long mode predecoder could
> be installed and called through a virtual function that already knew all
> that stuff and just plunked it in place with a simple copy. This is the
> strongest use case, I think.

OK, I see, so it's a matter of replacing all those if/else if branches
with a function pointer indirection.  That makes sense (in that I
understand your point now), though from a pure performance perspective
I'm sure that once you get the raw bytes-to-StaticInst cache working
the performance of the predecoder itself won't matter anymore ;-).

I can see where having a per-mode cache could make sense for modes
that redefine most instructions (like x86 32- vs 64-bit?) but it's
less clear for things like Alpha PAL mode that only affect a small
subset of the instruction space.

> Doesn't decode(DecodeContext *this, MachInst i) look like the C version of
> DecodeContext::decode(MachInst i)? :-)

It sure does, which is why I often feel we're discussing these things
like they're huge changes when I'm not convinced they're not just
superficial syntax...

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] X86 performance

Reply via email to