Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Steve Reinhardt Sun, 18 Sep 2011 19:53:28 -0700

I'd say that if atomic accesses aren't atomic, then multiprocessor systems
do not work, and that's not a particularly small exception.  I suppose you
could still run multiprogrammed single-thread workloads in SE mode, but
that's a small exception to it not working, not the other way around.
 Uniprocessor FS is also very suspect since you can still have atomicity
violations with device accesses.  I think only uniprocessor SE mode can be
considered working if atomic accesses aren't atomic.


This is not really related to completeness; it's a fundamental operation
that is implemented but is not guaranteed to give the correct answer.  The
same thing goes for not getting the consistency model right.

I didn't touch the MIPS table at all, so if it's out of date, go ahead and
update it.

Steve

On Sun, Sep 18, 2011 at 6:59 PM, Gabe Black <[email protected]> wrote:

> **
> I think X86 is excessively red. Timing simple CPU should be yellow for SE
> and FS uni and multiprocessor, except for the small exception that atomic
> accesses aren't atomic. The implementation is always going to be incomplete
> because X86 is so large and frequently ambiguous and because only a fraction
> of it is actually useful. Marking it as "definitely does not work" is a bit
> draconian. O3 support in SE should be the same, and I'd say O3 in FS should
> be orange.
>
> On the other hand, MIPS is overly not red. There is no MIPS_FS target
> because one wouldn't compile as of today, so everything FS should be red.
>
> I don't know the status of Ruby on anything so I can't comment on those.
>
> Gabe
>
>
> On 09/18/11 18:17, Steve Reinhardt wrote:
>
> Yea, whether you call it a bug or an unimplemented feature, it still
> doesn't work... it's definitely a bug that that's not documented though.  I
> updated the status matrix to reflect this problem:
> http://gem5.org/Status_Matrix
>
>  (I also did a bunch of general editing on the status matrix too... Gabe,
> you may want to check it out and see what you think.)
>
>  Note that this is a problem only in the "classic" m5 cache models; Ruby
> does support x86 locking.  However, Ruby doesn't support O3 LSQ probes to
> enforce stronger consistency models, so this gets you TimingSimple CPUs but
> not O3 CPUs.
>
>  Adding locked RMW access to the classic caches is doable, but not
> completely trivial... basically if a snoop (probe) arrives that would
> downgrade access to a locked block, that snoop has to be deferred and
> processed after the lock is released.  There's already support in the
> protocol for deferring snoops that hit on an MSHR, but the details of how
> that's extended to handle locked blocks are TBD.  I expect the solution
> involves either (1) adding a bit to the tags to mark locked blocks or (2)
> allocating an MSHR or MSHR-like structure for each locked block.  There are
> pros and cons to each.  I don't have time to implement this myself, but if
> someone else wants to take a crack, I'd be glad to consult.
>
>  Do we have an issue in O3 with speculatively issuing a the read part of a
> locked RMW (which locks the block) but then, due to a squash, not issuing
> the write that unlocks it?  That seems like a tricky bit... I don't know if
> Ruby handles this or not.
>
>  Steve
>
> On Sat, Sep 17, 2011 at 4:39 PM, Gabriel Michael Black <
> [email protected]> wrote:
>
>> Hi Meredydd. I'd say this isn't a bug, perse, but it is wrong. Basically
>> the support for locking memory operations is incomplete.
>>
>> The way this is supposed to work is that a load with the LOCKED flag set
>> will lock a chunk of memory, and then a subsequent store with the LOCKED
>> flag set will unlock it. All stores with LOCKED set must be preceded by a
>> load with that set. You could think of the load as acquiring a mutex and the
>> store as releasing it.
>>
>> In atomic mode, because gem5 is single threaded and because atomic memory
>> accesses complete immediately, the only thing you need to do to make sure
>> locked memory accesses aren't interrupted by anything is to make sure the
>> cpu keeps control until the locked section is complete. To do that we just
>> keep track of whether or not we've executed a locked load and don't stop
>> executing instructions until we see a locked store. This is what you're
>> seeing in the atomic mode CPU.
>>
>> In timing mode, which is what all other CPUs use including the timing
>> simple CPU, something more complex is needed because memory accesses take
>> "time" and other things can happen while the CPU waits for a response. In
>> that case, the locking would have to actually happen in the memory system
>> and the various components (caches, memory, or something else) would have to
>> keep track of what areas of memory (if any) are currently locked. This is
>> the part that isn't yet implemented.
>>
>> So in summary, yes it is known to not work properly, but I wouldn't call
>> it a bug, I'd say that it's just not finished yet.
>>
>> Gabe
>>
>>
>> Quoting Meredydd Luff <[email protected]>:
>>
>>    It appears that the CAS (LOCK; CMPXCHGx) instruction doesn't do what
>>> it says on the tin, at least using the O3 model and X86_SE. When I run
>>> the following code (inside a container that runs this code once on
>>> each of four processors):
>>>
>>>    volatile unsigned long x;
>>>    [...]
>>>    for(a=0; a<1000; a++) {
>>>        while(lastx = *x, oldx = cas(x, lastx, lastx+1), oldx != lastx);
>>>
>>>
>>> ...I get final x values of 1200 or so (rather than 4000, as would
>>> happen if the compare-and-swap were atomic). This is using the
>>> standard se.py, and a fresh checkout of the gem5 repository - my
>>> command line is:
>>> build/X86_SE/m5.opt configs/example/se.py -d --caches -n 4 -c
>>> /path/to/my/binary
>>>
>>>
>>> Is this a known bug? Looking at the x86 microcode, it appears that the
>>> relevant microops are ldstl and stul. Their only difference from what
>>> appears to be their unlocked equivalents (ldst and st) is the addition
>>> of the Request::LOCKED flag. A quick grep indicates that that LOCKED
>>> flag is only accessed by the Request::isLocked() accessor function,
>>> and that isLocked() is not referenced anywhere except twice in
>>> cpu/simple/atomic.cc.
>>>
>>> Unless I'm missing something, it appears that atomic memory accesses
>>> are simply not implemented. Is this true?
>>>
>>> Meredydd
>>>
>>>
>>> PS - This is the CAS I'm using:
>>>
>>> static inline unsigned long cas(volatile unsigned long* ptr, unsigned
>>> long old, unsigned long _new)
>>> {
>>>    unsigned long prev;
>>>    asm volatile("lock;"
>>>                 "cmpxchgq %1, %2;"
>>>                 : "=a"(prev)
>>>                 : "q"(_new), "m"(*ptr), "0"(old)
>>>                 : "memory");
>>>    return prev;
>>> }
>>>
>>>
>>>
>>> PPS - I searched around this issue, and the only relevant thing I
>>> found was a mailing list post from last year, indicating that ldstl
>>> and stul were working for someone (no indication that was using O3,
>>> though): http://www.mail-archive.com/[email protected]/msg07297.html
>>> This would indicate that at least one CPU model does support atomicity
>>> - but even looking in atomic.cc, I can't immediately see why that
>>> would work!
>>>
>>> There is some code for handling a flag called
>>> Request::MEM_SWAP_COND/isCondSwap(), but it appears to be generated
>>> only by the SPARC ISA, and examined only by the simple timing and
>>> atomic models.
>>> _______________________________________________
>>>  gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> _______________________________________________
> gem5-users mailing 
> [email protected]http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Reply via email to