On Sun, Jun 13, 2010 at 3:04 PM, Gabe Black <[email protected]> wrote:
> Gabe Black wrote:
>> Timothy M Jones wrote:
>>
>>> Hi everyone,
>>>
>>> On 06/06/2010 18:59, Ali Saidi wrote:
>>>
>>>> On Jun 6, 2010, at 4:53 PM, Steve Reinhardt wrote:
>>>>
>>>>
>>>>> I've only thought about this briefly, but here are a few quick
>>>>> reactions:
>>>>>
>>>>> - PowerPC has updating ld/st instructions too.  How are these handled?
>>>>> Whatever we do, we should do the same thing for both.
>>>>>
>>>> Tim, care to comment?
>>>>
>>>>
>>> Yes, the Power ISA has loads and stores that update a given register
>>> with the effective address, exactly like the example Ali gave.
>>>
>>> I've written these so that the register update is performed in
>>> completeAcc().  I haven't profiled performance in O3 and hadn't
>>> thought about the dependencies that this would cause.  If you've got a
>>> better solution to use then I am happy to alter the Power code to use
>>> this.
>>>
>>> To be honest, I am implementing some more instructions in Power and
>>> have come across two that load or store multiple register values to
>>> memory. I was going to write a question about implementing this to the
>>> list anyway!  If the solution to the above problem was creating
>>> micro-ops, then I could implement the multiple load/store instructions
>>> in the same way too, otherwise I will have to find a different
>>> solution to this.
>>>
>>> Cheers
>>> Tim
>>>
>>>
>>
>> I think microops are generally going to be a pretty good solution, but
>> one catch is that when they can't execute in parallel for whatever
>> reason (ie. in the simple CPU) they'll count as two instructions in
>> stead of one. That could mess with any rough performance measurement you
>> were trying to do. Also, it's not a problem perse, but make sure you do
>> the register update second so that the load/store has a chance to kill
>> the macroop.
>>
>> Also, if you already have a microop that does a register update like
>> this (the stupd, store with update, microop in x86 is like this) then it
>> would have to be split into two microops. This scheme would effectively
>> make this microop impossible without reintroducing this problem. I'd
>> assume it was part of the microop set I borrowed for a reason, and we'd
>> be defeating that by eliminating it.
>>
>> Gabe
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
> I just thought of another, more important drawback. In an in order
> pipeline, the writeback will take up an extra pipeline stage,
> effectively adding a bubble. In reality, I'd imagine the update would be
> computed in the execute stage at the same time as the address
> computation. This is especially important for ARM which, if I'm not
> mistaken, is usually implemented as an in order pipeline.

That is mostly accurate. To the best of my knowledge, Cortex A9 is the
only out-order core with speculative execution.

>
> Gabe

-Soumyaroop

> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>



-- 
Soumyaroop Roy
Ph.D. Candidate
Department of Computer Science and Engineering
University of South Florida, Tampa
http://www.csee.usf.edu/~sroy
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to