Quoting Gabe Black <gbl...@eecs.umich.edu>:
Gabe Black wrote:
This has come up in ARM and also in X86 with its STUPD (store with
update) microop. The problem has been updating the base register when,
one, the instruction may fault after initiateAcc and the initial value
is lost, and two, completeAcc isn't called by O3. The problem is
compounded by the fact that O3 can speculatively update the register and
recover the old value if there's a fault, and the simple CPUs can't.
What if we changed the instructions that update the base to update the
base in initiateAcc and store the old value in an architecturally
invisible register? Then, if the instruction faults for whatever reason,
the fault object can know it needs to restore the old value of the base
before vectoring into the fault handler. If the instruction completes
normally the value of the base will be updated for consumption by later
instructions, and the value of the backup register can be ignored. I
don't -think- there would be performance distortions from this since the
actual number of sources/destinations doesn't matter, and this would be
at least a little more realistic and simulator level performant than
splitting things into microops.
This would be pretty easy to implement, I think, and would be entirely
contained in existing mechanisms in the ISA, so there isn't really any
question there. What I'd like to know is whether people think this is a
reasonable approach to this problem in the first place.
Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
Hmm. This probably won't work. O3 would revert to the old value of the
backup register, I think, and the fault object would clobber the
correctly restored base register with that old value.
Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev
OK, so, ideally we'd want to put the any register updates for a store
in initiateAcc since they're not dependent on memory, that way other
instructions can use them sooner, O3 doesn't run completeAcc, etc.,
but that doesn't work because SimpleCPU and O3 are inconsistent as far
as the commit points an instruction goes through as it runs. In
SimpleCPU state is updated live, so every setIntReg is a commit point.
In O3, the instructions are updating a dyninst so the commit point is
the actual commit stage. For regular instructions this is masked by
writing back results at the end of the execute function and only if
there's no fault, but for memory ops with initiateAcc and completeAcc,
all possible faults haven't happened by the point the instruction
loses control. The actual commit points of the instruction then
introduce functional differences and break the consistency of the
instruction model.
The problem seems to be that O3 is smarter than SimpleCPU, or really
that O3 is more capable at undoing things that shouldn't have
happened. One solution might be to make SimpleCPU smarter, but why
don't we make O3 selectively dumber?
We might be able to solve this problem if we change the semantics of
initateAcc, the access, and completeAcc for stores. We could do the
same for loads for symmetry, but I won't push for it because of the
arguments Steve made about base updating loads and the fact that it
might not work as well there. Anyway, instead of trying
(unsuccessfully) to string intiateAcc, the access, and completeAcc,
together as one large atomic operation, lets make them all separate.
Once initiateAcc finishes, if it doesn't return a fault it commits. If
the access faults later that's handled, but the state written to in
initiateAcc is already permanent. If something needs to be rolled
back, initiateAcc needs to set up backup state like I talked about in
my earlier email. completeAcc would then never be called for stores.
This is nice because it means all CPUs can behave the same, we get all
the benefits of writing back state in initiateAcc, there's no
simulated performance overhead as far as I can see, the impact on
existing ISA code is minimal, and (I hope) it shouldn't be that hard
to implement or carry that much baggage for later.
So what do people think of this second version? Hopefully we don't
need a third :-).
Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev