Quoting Steve Reinhardt <[email protected]>:
On Fri, Sep 10, 2010 at 10:48 AM, Ali Saidi <[email protected]> wrote:
On Fri, 10 Sep 2010 08:55:47 -0500, Ali Saidi <[email protected]> wrote:
We just changed the micro-ops in ARM so that any of the register
updating loads/stores were micro-coded, calculated the new address,
placed it in a temp register, loaded from the temp, and the moved the
temp into the real register. This solution worked fine for both O3 and
the simple cpus. Since ARM has some interesting options for PC
relative loads and loading the PC via a load, I think you can pretty
much accomplish anything with three carefully crafted micro-ops.
I think I mentioned this before, but it seems reasonable to me to do
store-updates in a single uop but require two uops for load-updates,
since a real pipeline would likely support one but not two register
writes per uop. I realize the asymmetry is a little weird, but that's
life. (Similarly, a store-update with a reg+reg addressing mode might
require two uops since you need to read three regs and not just two...
I vaguely recall that PowerPC might even have the restriction that
store updates can't use reg+reg addressing, even though it's available
for other memory accesses.)
Yeah, that made sense, and there's a store with update microop but no
load with update one. In general I've tried not to worry about details
like how many ports the register file would have since that starts to
get a little too close to a particular implementation. That might
sound a little odd since later I talk about wanting to get
store+update to work like it would be implemented, but there I'm
really trying to make sure the obvious optimization works like you'd
expect rather than tuning to particular microarchitectural choices.
That would still require some fix for Gabe's
completeAcc-not-getting-called-in-O3 problem. I'm a little confused
because I believe that the completeAcc callback is used for
store-conditionals to write back the success flag, which means that
(1) his solution of calling it right away won't work and (2) it must
be getting called somewhere in O3 since Alpha store-conditionals do
work there.
I don't know about store conditional, but I didn't see anywhere
completeAcc was called on the path stores take in the load/store
queue. There may be some other mechanism to handle that, but again I
don't really know how those work in O3. If they -do- work using some
other mechanism, then that would take care of (1), right?
Is there a reason not to update the register in initiateAcc?
Faults. If the access faults for some reason, you have to undo the
register updates in initateAcc for the instruction to appear atomic.
In O3 you can just throw them away, I think, but in other models like
simple timing where you've updated live state you can't do that. I
suppose that means the microop solution does still have a slight
advantage as far as avoiding unnecessary delays since it wouldn't wait
for translation to finish to speculatively update the base, but
usually translation should hit in the TLB. In any case expecting the
update in initiateAcc to be treated speculatively doesn't really work
in all cases.
All that said, I'm not necessarily opposed to piggybacking on the ARM
solution and just using multiple uops anyway. The patent is a great
guideline, but we shouldn't feel overly constrained by it.
This is true, and I'm sure there are other factors that will push us
out of sync with real implementations (like exactly how instructions
are microcoded, for instance), but I'd still feel warmer and fuzzier
if we could get the store with update to work as specified.
Also a less defensible reason I'd like to make it work is that it'd be
easier to get O3 to cooperate (I estimate, perhaps incorrectly) than
to go through all the microcode and update everything to use two
microops. I realize that's just me being lazy, but if we end up
avoiding that it'd be nice.
Gabe
PS. I think I may have to set x86 O3 aside for a while, so we probably
don't need to worry about this in the immediate term. It's probably
still good to discuss it, though.
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev