I think you guys are on the right track. There's a non speculative flag,
a serialize before, and a serialize after. I'm not sure which one is
exactly right, but some combination should be. We should be careful not
to over do it since that might artificially hurt performance, but I
don't *think* the lock prefix is used all that much these days so it
shouldn't be have *too* bad an impact if it isn't perfectly correct.

Gabe

On 10/26/11 09:56, Nilay Vaish wrote:
> On Tue, 25 Oct 2011, Steve Reinhardt wrote:
>
>> Good questions.  Clearly if we ever let the R part of an RMW
>> instruction out
>> to the cache, either we have to commit the instruction or add some
>> mechanism
>> to unlock the block.  One solution would be to mark all RMW
>> instructions as
>> serializing, which would prevent them from executing speculatively. 
>> That
>> (or something like it) might be necessary to get the consistency
>> model right
>> anyway, since I believe locked accesses act as fences (?? is that right,
>> Brad?).
>>
>> Gabe, did you have an alternate solution in mind?
>>
>> Steve
>>
>> On Tue, Oct 25, 2011 at 2:15 PM, Nilay Vaish <[email protected]> wrote:
>>
>>> Does this mean that an x86 O3 CPU will never squash an RMW
>>> instruction? I
>>> am posting an instruction + protocol trace for obtained from O3 and
>>> Ruby. In
>>> the first portion, you can see that the O3 CPU issues a locked RMW
>>> with the
>>> read part having sn = 3051 and the write part having sn = 3052. In the
>>> second portion, you can see that 3051 and 3052 are squashed and the
>>> in the
>>> third portion of the trace, these are committed. There are several
>>> things
>>> that I am not able to understand. Why is the RMW squashed, since x86
>>> architecture has to commit the instruction? Secondly, if RMW was being
>>> executed speculatively, then what mechanism exists for informing the
>>> cache
>>> controller about the instruction getting squashed? Thirdly, why was the
>>> instruction committed later on, when it was originally squashed?
>>>
>
> When I mark ldstl and stul as non-speculative, the O3 CPU and Ruby
> work on an example code in which two threads are incrementing a
> counter. Since locked RMW is a fence instruction (Steve suggested this
> above and AMD's manual agrees), it seems that the read portion should
> commit any of the loads and stores that appear before it in the
> program order. This means that ldstl should be marked as memory
> barrrier, and similarly stul should also be marked as memory barrier.
> But looking at src/arch/x86/isa/microops/ldstop.isa, it does not seem
> like that this flags can be currently supported. If others (especially
> Steve and Gabe) concurr with my understanding, I can modify the file
> to add the memory barrier flag.
>
> -- 
> Nilay
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to