Quoting Gabe Black <gbl...@eecs.umich.edu>:

Gabe Black wrote:
This has come up in ARM and also in X86 with its STUPD (store with
update) microop. The problem has been updating the base register when,
one, the instruction may fault after initiateAcc and the initial value
is lost, and two, completeAcc isn't called by O3. The problem is
compounded by the fact that O3 can speculatively update the register and
recover the old value if there's a fault, and the simple CPUs can't.

What if we changed the instructions that update the base to update the
base in initiateAcc and store the old value in an architecturally
invisible register? Then, if the instruction faults for whatever reason,
the fault object can know it needs to restore the old value of the base
before vectoring into the fault handler. If the instruction completes
normally the value of the base will be updated for consumption by later
instructions, and the value of the backup register can be ignored. I
don't -think- there would be performance distortions from this since the
actual number of sources/destinations doesn't matter, and this would be
at least a little more realistic and simulator level performant than
splitting things into microops.

This would be pretty easy to implement, I think, and would be entirely
contained in existing mechanisms in the ISA, so there isn't really any
question there. What I'd like to know is whether people think this is a
reasonable approach to this problem in the first place.

Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Hmm. This probably won't work. O3 would revert to the old value of the
backup register, I think, and the fault object would clobber the
correctly restored base register with that old value.

Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


OK, so, ideally we'd want to put the any register updates for a store in initiateAcc since they're not dependent on memory, that way other instructions can use them sooner, O3 doesn't run completeAcc, etc., but that doesn't work because SimpleCPU and O3 are inconsistent as far as the commit points an instruction goes through as it runs. In SimpleCPU state is updated live, so every setIntReg is a commit point. In O3, the instructions are updating a dyninst so the commit point is the actual commit stage. For regular instructions this is masked by writing back results at the end of the execute function and only if there's no fault, but for memory ops with initiateAcc and completeAcc, all possible faults haven't happened by the point the instruction loses control. The actual commit points of the instruction then introduce functional differences and break the consistency of the instruction model.

The problem seems to be that O3 is smarter than SimpleCPU, or really that O3 is more capable at undoing things that shouldn't have happened. One solution might be to make SimpleCPU smarter, but why don't we make O3 selectively dumber?

We might be able to solve this problem if we change the semantics of initateAcc, the access, and completeAcc for stores. We could do the same for loads for symmetry, but I won't push for it because of the arguments Steve made about base updating loads and the fact that it might not work as well there. Anyway, instead of trying (unsuccessfully) to string intiateAcc, the access, and completeAcc, together as one large atomic operation, lets make them all separate. Once initiateAcc finishes, if it doesn't return a fault it commits. If the access faults later that's handled, but the state written to in initiateAcc is already permanent. If something needs to be rolled back, initiateAcc needs to set up backup state like I talked about in my earlier email. completeAcc would then never be called for stores.

This is nice because it means all CPUs can behave the same, we get all the benefits of writing back state in initiateAcc, there's no simulated performance overhead as far as I can see, the impact on existing ISA code is minimal, and (I hope) it shouldn't be that hard to implement or carry that much baggage for later.

So what do people think of this second version? Hopefully we don't need a third :-).

Gabe
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to