I don't really understand what you're proposing. We solved this in ARM by using 
a temp register for the address calculation and then a operation to actually do 
the register update. Since we managed to implement all the wacky LDM/STM 
variants so they work in atomic, timing and o3 I think the existing mechanisms 
are sufficient and I'm not looking for any more upheaval in the code base at 
the moment.

Ali



On Oct 18, 2010, at 8:16 PM, Gabriel Michael Black wrote:

> Quoting Gabe Black <[email protected]>:
> 
>> Gabe Black wrote:
>>> This has come up in ARM and also in X86 with its STUPD (store with
>>> update) microop. The problem has been updating the base register when,
>>> one, the instruction may fault after initiateAcc and the initial value
>>> is lost, and two, completeAcc isn't called by O3. The problem is
>>> compounded by the fact that O3 can speculatively update the register and
>>> recover the old value if there's a fault, and the simple CPUs can't.
>>> 
>>> What if we changed the instructions that update the base to update the
>>> base in initiateAcc and store the old value in an architecturally
>>> invisible register? Then, if the instruction faults for whatever reason,
>>> the fault object can know it needs to restore the old value of the base
>>> before vectoring into the fault handler. If the instruction completes
>>> normally the value of the base will be updated for consumption by later
>>> instructions, and the value of the backup register can be ignored. I
>>> don't -think- there would be performance distortions from this since the
>>> actual number of sources/destinations doesn't matter, and this would be
>>> at least a little more realistic and simulator level performant than
>>> splitting things into microops.
>>> 
>>> This would be pretty easy to implement, I think, and would be entirely
>>> contained in existing mechanisms in the ISA, so there isn't really any
>>> question there. What I'd like to know is whether people think this is a
>>> reasonable approach to this problem in the first place.
>>> 
>>> Gabe
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>> 
>> 
>> Hmm. This probably won't work. O3 would revert to the old value of the
>> backup register, I think, and the fault object would clobber the
>> correctly restored base register with that old value.
>> 
>> Gabe
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>> 
> 
> OK, so, ideally we'd want to put the any register updates for a store in 
> initiateAcc since they're not dependent on memory, that way other 
> instructions can use them sooner, O3 doesn't run completeAcc, etc., but that 
> doesn't work because SimpleCPU and O3 are inconsistent as far as the commit 
> points an instruction goes through as it runs. In SimpleCPU state is updated 
> live, so every setIntReg is a commit point. In O3, the instructions are 
> updating a dyninst so the commit point is the actual commit stage. For 
> regular instructions this is masked by writing back results at the end of the 
> execute function and only if there's no fault, but for memory ops with 
> initiateAcc and completeAcc, all possible faults haven't happened by the 
> point the instruction loses control. The actual commit points of the 
> instruction then introduce functional differences and break the consistency 
> of the instruction model.
> 
> The problem seems to be that O3 is smarter than SimpleCPU, or really that O3 
> is more capable at undoing things that shouldn't have happened. One solution 
> might be to make SimpleCPU smarter, but why don't we make O3 selectively 
> dumber?
> 
> We might be able to solve this problem if we change the semantics of 
> initateAcc, the access, and completeAcc for stores. We could do the same for 
> loads for symmetry, but I won't push for it because of the arguments Steve 
> made about base updating loads and the fact that it might not work as well 
> there. Anyway, instead of trying (unsuccessfully) to string intiateAcc, the 
> access, and completeAcc, together as one large atomic operation, lets make 
> them all separate. Once initiateAcc finishes, if it doesn't return a fault it 
> commits. If the access faults later that's handled, but the state written to 
> in initiateAcc is already permanent. If something needs to be rolled back, 
> initiateAcc needs to set up backup state like I talked about in my earlier 
> email. completeAcc would then never be called for stores.
> 
> This is nice because it means all CPUs can behave the same, we get all the 
> benefits of writing back state in initiateAcc, there's no simulated 
> performance overhead as far as I can see, the impact on existing ISA code is 
> minimal, and (I hope) it shouldn't be that hard to implement or carry that 
> much baggage for later.
> 
> So what do people think of this second version? Hopefully we don't need a 
> third :-).
> 
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
> 

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to