On Fri, Oct 22, 2010 at 10:57 AM, Gabe Black <[email protected]> wrote:
>> Is this just to get STUPD to be a single uop instead of two
>> uops that communicate via a temp reg, without forcing dependent
>> instructions to wait for the STUPD to commit to get the updated base
>> value?
>>
>
> I wouldn't say "just", but essentially yes.

So it seems like the overriding question is: is all this hassle really
worth it?  How often do we use a STUPD uop dynamically anyway?

>> Do we need another execution phase like completeTrans() that can be
>> overridden here?  Generally it's not unreasonable to say that any
>> exception that occurs post-translation on a store is imprecise... I
>> don't know if x86 specifically has any exceptions to that rule.
>>
>
> I think that would be a fairly major change, and 99% of the time
> completeTrans either wouldn't be used or wouldn't do anything, depending
> on how it's implemented

I'm not overwhelmingly concerned about that... O3 is slow enough that
doing one more virtual function call per dynamic memory access (that
will typically hit in the BTB if all the no-op versions point to the
same base implementation) probably won't make a major difference.

Same with calling completeAcc() on stores, though in that case I agree
that it still isn't really the right point to do the update.  In fact,
since O3 explicitly checks to see if an instruction is a
store-conditional to know whether to call completeAcc(), it might even
be faster to call completeAcc() unconditionally and let the virtual
function call replace that if test.

> I don't think we're talking about exceptions
> post translation, just during translation.

Yea, what I meant was that if you do the update post translation
(including waiting for a delayed translation, so you know the
translation didn't fault), then you don't have to worry about rolling
it back because the instruction won't take a later exception, so it
would be safe to "commit" the value at that point.  That does force
the update to potentially wait for a page-table walk though which is
still not ideal.

So one annoying thing is that there's no benefit to doing the update
in initiateAcc() for TImingSimpleCPU; the only reason to make that
work is so that we can do it in initiateAcc() in O3 and have the same
code work in both places.  It seems like the problem is that we either
call execute() or initiateAcc()/completeAcc(), and in this case we
really want to continue to call execute() to do the update in addition
to using initiateAcc()/completeAcc().  Again, the easy way to do this
is to use two uops.  If we really feel we need an alternative, it
still feels to me like the right thing to do is to define some new
StaticInst method that gets called when initiateAcc() gets called in
O3, but gets called when the instruction commits in TimingSimpleCPU.
Either that or find a way for the instruction to know which model it's
in, and do the update in initiateAcc() for O3 and in completeAcc() for
TImingSimpleCPU.  (I really don't like that last one, but I still like
it better than implementing speculation via a temp reg inside the
instruction definition itself.)

Steve
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to