Hmm, I'm really not familiar with O3 myself, but the ability to do
these kinds of fast tests on machine instructions is one of the
fundamental advantages of RISC over CISC.  So if we want to go for
accuracy, then it;s not unreasonable to have the checks for things
like unconditional branches be done in different pipeline stages for
different ISAs.  I'm not necessarily advocating that, since I think
CPU model simplicity is also a worthy goal, just pointing out that
this may be a fundamental tradeoff between realism and simplicity.
Furthermore, some x86 architectures use predecode bits in the icache
to help with this kind of thing; that's a level of detail I don't want
to see us get into, but you could argue that doing some predecoder
magic is logically just approximating the effect of icache predecode
bits and I would not fight you on that.

So bottom line is it's pretty much up to you, IMO, just wanted to give
you some more food for thought.

Steve

On Fri, Aug 20, 2010 at 9:46 PM, Gabe Black <[email protected]> wrote:
> Maybe the predecoder should detect those easy to decode instructions? It
> could return an ExtMachInst and a loose classification of the
> instruction if that would make sense for that ISA. That way the full on
> decode wouldn't happen until the decode stage, but you could still take
> advantage of the quick checks in fetch. Thoughts?
>
> Gabe
>
> Gabe Black wrote:
>> Yeah, I'd guessed that's what it was. I tried to implement my second
>> approach and that didn't work out, so I tried my third approach and that
>> did.
>>
>> Now I need to teach O3 how to handle variable instruction lengths and
>> more complex microcode, and I've rediscovered O3 does its decode in
>> fetch. It's doing that, I think, to detect unconditional branches and
>> other really easy to decode in Alpha instructions at the time it does
>> branch prediction so it gets a more intelligent answer. For SPARC, I put
>> the macroop unpacking right there so all the instructions leaving decode
>> (which is really in fetch) are actually going to get executed. The
>> macroops are used up and then discarded without entering the rest of the
>> system. That works when you've got a fixed size, fairly short, straight
>> line run of microops, but it going to be a little messier for x86. I'd
>> like to pull the decode into the actual decode stage, but then that
>> would disrupt what Alpha is doing with these early decode bits. I
>> haven't thought about it enough to determine if there's a problem, but
>> if anyone wants to throw in their two cents as far as suggestions or
>> constraints feel free.
>>
>> Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to