On Fri, Sep 10, 2010 at 10:48 AM, Ali Saidi <[email protected]> wrote: > > On Fri, 10 Sep 2010 08:55:47 -0500, Ali Saidi <[email protected]> wrote: >> We just changed the micro-ops in ARM so that any of the register >> updating loads/stores were micro-coded, calculated the new address, >> placed it in a temp register, loaded from the temp, and the moved the >> temp into the real register. This solution worked fine for both O3 and >> the simple cpus. Since ARM has some interesting options for PC >> relative loads and loading the PC via a load, I think you can pretty >> much accomplish anything with three carefully crafted micro-ops.
I think I mentioned this before, but it seems reasonable to me to do store-updates in a single uop but require two uops for load-updates, since a real pipeline would likely support one but not two register writes per uop. I realize the asymmetry is a little weird, but that's life. (Similarly, a store-update with a reg+reg addressing mode might require two uops since you need to read three regs and not just two... I vaguely recall that PowerPC might even have the restriction that store updates can't use reg+reg addressing, even though it's available for other memory accesses.) That would still require some fix for Gabe's completeAcc-not-getting-called-in-O3 problem. I'm a little confused because I believe that the completeAcc callback is used for store-conditionals to write back the success flag, which means that (1) his solution of calling it right away won't work and (2) it must be getting called somewhere in O3 since Alpha store-conditionals do work there. Is there a reason not to update the register in initiateAcc? All that said, I'm not necessarily opposed to piggybacking on the ARM solution and just using multiple uops anyway. The patent is a great guideline, but we shouldn't feel overly constrained by it. Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
