On Fri, Oct 22, 2010 at 10:57 AM, Gabe Black <[email protected]> wrote: >> Is this just to get STUPD to be a single uop instead of two >> uops that communicate via a temp reg, without forcing dependent >> instructions to wait for the STUPD to commit to get the updated base >> value? >> > > I wouldn't say "just", but essentially yes.
So it seems like the overriding question is: is all this hassle really worth it? How often do we use a STUPD uop dynamically anyway? >> Do we need another execution phase like completeTrans() that can be >> overridden here? Generally it's not unreasonable to say that any >> exception that occurs post-translation on a store is imprecise... I >> don't know if x86 specifically has any exceptions to that rule. >> > > I think that would be a fairly major change, and 99% of the time > completeTrans either wouldn't be used or wouldn't do anything, depending > on how it's implemented I'm not overwhelmingly concerned about that... O3 is slow enough that doing one more virtual function call per dynamic memory access (that will typically hit in the BTB if all the no-op versions point to the same base implementation) probably won't make a major difference. Same with calling completeAcc() on stores, though in that case I agree that it still isn't really the right point to do the update. In fact, since O3 explicitly checks to see if an instruction is a store-conditional to know whether to call completeAcc(), it might even be faster to call completeAcc() unconditionally and let the virtual function call replace that if test. > I don't think we're talking about exceptions > post translation, just during translation. Yea, what I meant was that if you do the update post translation (including waiting for a delayed translation, so you know the translation didn't fault), then you don't have to worry about rolling it back because the instruction won't take a later exception, so it would be safe to "commit" the value at that point. That does force the update to potentially wait for a page-table walk though which is still not ideal. So one annoying thing is that there's no benefit to doing the update in initiateAcc() for TImingSimpleCPU; the only reason to make that work is so that we can do it in initiateAcc() in O3 and have the same code work in both places. It seems like the problem is that we either call execute() or initiateAcc()/completeAcc(), and in this case we really want to continue to call execute() to do the update in addition to using initiateAcc()/completeAcc(). Again, the easy way to do this is to use two uops. If we really feel we need an alternative, it still feels to me like the right thing to do is to define some new StaticInst method that gets called when initiateAcc() gets called in O3, but gets called when the instruction commits in TimingSimpleCPU. Either that or find a way for the instruction to know which model it's in, and do the update in initiateAcc() for O3 and in completeAcc() for TImingSimpleCPU. (I really don't like that last one, but I still like it better than implementing speculation via a temp reg inside the instruction definition itself.) Steve _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
