Quoting Steve Reinhardt <[email protected]>:

On Fri, Sep 10, 2010 at 10:48 AM, Ali Saidi <[email protected]> wrote:

On Fri, 10 Sep 2010 08:55:47 -0500, Ali Saidi <[email protected]> wrote:
We just changed the micro-ops in ARM so that any of the register
updating loads/stores were micro-coded, calculated the new address,
placed it in a temp register, loaded from the temp, and the moved the
temp into the real register. This solution worked fine for both O3 and
the simple cpus. Since ARM has some interesting options for PC
relative loads and loading the PC via a load, I think you can pretty
much accomplish anything with three carefully crafted micro-ops.

I think I mentioned this before, but it seems reasonable to me to do
store-updates in a single uop but require two uops for load-updates,
since a real pipeline would likely support one but not two register
writes per uop.  I realize the asymmetry is a little weird, but that's
life.  (Similarly, a store-update with a reg+reg addressing mode might
require two uops since you need to read three regs and not just two...
I vaguely recall that PowerPC might even have the restriction that
store updates can't use reg+reg addressing, even though it's available
for other memory accesses.)

Yeah, that made sense, and there's a store with update microop but no load with update one. In general I've tried not to worry about details like how many ports the register file would have since that starts to get a little too close to a particular implementation. That might sound a little odd since later I talk about wanting to get store+update to work like it would be implemented, but there I'm really trying to make sure the obvious optimization works like you'd expect rather than tuning to particular microarchitectural choices.


That would still require some fix for Gabe's
completeAcc-not-getting-called-in-O3 problem.  I'm a little confused
because I believe that the completeAcc callback is used for
store-conditionals to write back the success flag, which means that
(1) his solution of calling it right away won't work and (2) it must
be getting called somewhere in O3 since Alpha store-conditionals do
work there.

I don't know about store conditional, but I didn't see anywhere completeAcc was called on the path stores take in the load/store queue. There may be some other mechanism to handle that, but again I don't really know how those work in O3. If they -do- work using some other mechanism, then that would take care of (1), right?


Is there a reason not to update the register in initiateAcc?

Faults. If the access faults for some reason, you have to undo the register updates in initateAcc for the instruction to appear atomic. In O3 you can just throw them away, I think, but in other models like simple timing where you've updated live state you can't do that. I suppose that means the microop solution does still have a slight advantage as far as avoiding unnecessary delays since it wouldn't wait for translation to finish to speculatively update the base, but usually translation should hit in the TLB. In any case expecting the update in initiateAcc to be treated speculatively doesn't really work in all cases.


All that said, I'm not necessarily opposed to piggybacking on the ARM
solution and just using multiple uops anyway.  The patent is a great
guideline, but we shouldn't feel overly constrained by it.

This is true, and I'm sure there are other factors that will push us out of sync with real implementations (like exactly how instructions are microcoded, for instance), but I'd still feel warmer and fuzzier if we could get the store with update to work as specified.

Also a less defensible reason I'd like to make it work is that it'd be easier to get O3 to cooperate (I estimate, perhaps incorrectly) than to go through all the microcode and update everything to use two microops. I realize that's just me being lazy, but if we end up avoiding that it'd be nice.

Gabe

PS. I think I may have to set x86 O3 aside for a while, so we probably don't need to worry about this in the immediate term. It's probably still good to discuss it, though.
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to