I don't really understand what you're proposing. We solved this in ARM by using a temp register for the address calculation and then a operation to actually do the register update. Since we managed to implement all the wacky LDM/STM variants so they work in atomic, timing and o3 I think the existing mechanisms are sufficient and I'm not looking for any more upheaval in the code base at the moment.
Ali On Oct 18, 2010, at 8:16 PM, Gabriel Michael Black wrote: > Quoting Gabe Black <[email protected]>: > >> Gabe Black wrote: >>> This has come up in ARM and also in X86 with its STUPD (store with >>> update) microop. The problem has been updating the base register when, >>> one, the instruction may fault after initiateAcc and the initial value >>> is lost, and two, completeAcc isn't called by O3. The problem is >>> compounded by the fact that O3 can speculatively update the register and >>> recover the old value if there's a fault, and the simple CPUs can't. >>> >>> What if we changed the instructions that update the base to update the >>> base in initiateAcc and store the old value in an architecturally >>> invisible register? Then, if the instruction faults for whatever reason, >>> the fault object can know it needs to restore the old value of the base >>> before vectoring into the fault handler. If the instruction completes >>> normally the value of the base will be updated for consumption by later >>> instructions, and the value of the backup register can be ignored. I >>> don't -think- there would be performance distortions from this since the >>> actual number of sources/destinations doesn't matter, and this would be >>> at least a little more realistic and simulator level performant than >>> splitting things into microops. >>> >>> This would be pretty easy to implement, I think, and would be entirely >>> contained in existing mechanisms in the ISA, so there isn't really any >>> question there. What I'd like to know is whether people think this is a >>> reasonable approach to this problem in the first place. >>> >>> Gabe >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> Hmm. This probably won't work. O3 would revert to the old value of the >> backup register, I think, and the fault object would clobber the >> correctly restored base register with that old value. >> >> Gabe >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > > OK, so, ideally we'd want to put the any register updates for a store in > initiateAcc since they're not dependent on memory, that way other > instructions can use them sooner, O3 doesn't run completeAcc, etc., but that > doesn't work because SimpleCPU and O3 are inconsistent as far as the commit > points an instruction goes through as it runs. In SimpleCPU state is updated > live, so every setIntReg is a commit point. In O3, the instructions are > updating a dyninst so the commit point is the actual commit stage. For > regular instructions this is masked by writing back results at the end of the > execute function and only if there's no fault, but for memory ops with > initiateAcc and completeAcc, all possible faults haven't happened by the > point the instruction loses control. The actual commit points of the > instruction then introduce functional differences and break the consistency > of the instruction model. > > The problem seems to be that O3 is smarter than SimpleCPU, or really that O3 > is more capable at undoing things that shouldn't have happened. One solution > might be to make SimpleCPU smarter, but why don't we make O3 selectively > dumber? > > We might be able to solve this problem if we change the semantics of > initateAcc, the access, and completeAcc for stores. We could do the same for > loads for symmetry, but I won't push for it because of the arguments Steve > made about base updating loads and the fact that it might not work as well > there. Anyway, instead of trying (unsuccessfully) to string intiateAcc, the > access, and completeAcc, together as one large atomic operation, lets make > them all separate. Once initiateAcc finishes, if it doesn't return a fault it > commits. If the access faults later that's handled, but the state written to > in initiateAcc is already permanent. If something needs to be rolled back, > initiateAcc needs to set up backup state like I talked about in my earlier > email. completeAcc would then never be called for stores. > > This is nice because it means all CPUs can behave the same, we get all the > benefits of writing back state in initiateAcc, there's no simulated > performance overhead as far as I can see, the impact on existing ISA code is > minimal, and (I hope) it shouldn't be that hard to implement or carry that > much baggage for later. > > So what do people think of this second version? Hopefully we don't need a > third :-). > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
