On Sun, Jun 13, 2010 at 3:04 PM, Gabe Black <[email protected]> wrote: > Gabe Black wrote: >> Timothy M Jones wrote: >> >>> Hi everyone, >>> >>> On 06/06/2010 18:59, Ali Saidi wrote: >>> >>>> On Jun 6, 2010, at 4:53 PM, Steve Reinhardt wrote: >>>> >>>> >>>>> I've only thought about this briefly, but here are a few quick >>>>> reactions: >>>>> >>>>> - PowerPC has updating ld/st instructions too. How are these handled? >>>>> Whatever we do, we should do the same thing for both. >>>>> >>>> Tim, care to comment? >>>> >>>> >>> Yes, the Power ISA has loads and stores that update a given register >>> with the effective address, exactly like the example Ali gave. >>> >>> I've written these so that the register update is performed in >>> completeAcc(). I haven't profiled performance in O3 and hadn't >>> thought about the dependencies that this would cause. If you've got a >>> better solution to use then I am happy to alter the Power code to use >>> this. >>> >>> To be honest, I am implementing some more instructions in Power and >>> have come across two that load or store multiple register values to >>> memory. I was going to write a question about implementing this to the >>> list anyway! If the solution to the above problem was creating >>> micro-ops, then I could implement the multiple load/store instructions >>> in the same way too, otherwise I will have to find a different >>> solution to this. >>> >>> Cheers >>> Tim >>> >>> >> >> I think microops are generally going to be a pretty good solution, but >> one catch is that when they can't execute in parallel for whatever >> reason (ie. in the simple CPU) they'll count as two instructions in >> stead of one. That could mess with any rough performance measurement you >> were trying to do. Also, it's not a problem perse, but make sure you do >> the register update second so that the load/store has a chance to kill >> the macroop. >> >> Also, if you already have a microop that does a register update like >> this (the stupd, store with update, microop in x86 is like this) then it >> would have to be split into two microops. This scheme would effectively >> make this microop impossible without reintroducing this problem. I'd >> assume it was part of the microop set I borrowed for a reason, and we'd >> be defeating that by eliminating it. >> >> Gabe >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > I just thought of another, more important drawback. In an in order > pipeline, the writeback will take up an extra pipeline stage, > effectively adding a bubble. In reality, I'd imagine the update would be > computed in the execute stage at the same time as the address > computation. This is especially important for ARM which, if I'm not > mistaken, is usually implemented as an in order pipeline.
That is mostly accurate. To the best of my knowledge, Cortex A9 is the only out-order core with speculative execution. > > Gabe -Soumyaroop > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > -- Soumyaroop Roy Ph.D. Candidate Department of Computer Science and Engineering University of South Florida, Tampa http://www.csee.usf.edu/~sroy _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
