Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Gabe Black Sun, 18 Sep 2011 19:17:38 -0700

Oh, also, POWER is the same as MIPS. There's no FS build, so all FS
entries should be red.


Gabe

On 09/18/11 18:59, Gabe Black wrote:
> I think X86 is excessively red. Timing simple CPU should be yellow for
> SE and FS uni and multiprocessor, except for the small exception that
> atomic accesses aren't atomic. The implementation is always going to
> be incomplete because X86 is so large and frequently ambiguous and
> because only a fraction of it is actually useful. Marking it as
> "definitely does not work" is a bit draconian. O3 support in SE should
> be the same, and I'd say O3 in FS should be orange.
>
> On the other hand, MIPS is overly not red. There is no MIPS_FS target
> because one wouldn't compile as of today, so everything FS should be red.
>
> I don't know the status of Ruby on anything so I can't comment on those.
>
> Gabe
>
> On 09/18/11 18:17, Steve Reinhardt wrote:
>> Yea, whether you call it a bug or an unimplemented feature, it still
>> doesn't work... it's definitely a bug that that's not documented
>> though.  I updated the status matrix to reflect this problem:
>> http://gem5.org/Status_Matrix
>>
>> (I also did a bunch of general editing on the status matrix too...
>> Gabe, you may want to check it out and see what you think.)
>>
>> Note that this is a problem only in the "classic" m5 cache models;
>> Ruby does support x86 locking.  However, Ruby doesn't support O3 LSQ
>> probes to enforce stronger consistency models, so this gets you
>> TimingSimple CPUs but not O3 CPUs.
>>
>> Adding locked RMW access to the classic caches is doable, but not
>> completely trivial... basically if a snoop (probe) arrives that would
>> downgrade access to a locked block, that snoop has to be deferred and
>> processed after the lock is released.  There's already support in the
>> protocol for deferring snoops that hit on an MSHR, but the details of
>> how that's extended to handle locked blocks are TBD.  I expect the
>> solution involves either (1) adding a bit to the tags to mark locked
>> blocks or (2) allocating an MSHR or MSHR-like structure for each
>> locked block.  There are pros and cons to each.  I don't have time to
>> implement this myself, but if someone else wants to take a crack, I'd
>> be glad to consult.
>>
>> Do we have an issue in O3 with speculatively issuing a the read part
>> of a locked RMW (which locks the block) but then, due to a squash,
>> not issuing the write that unlocks it?  That seems like a tricky
>> bit... I don't know if Ruby handles this or not.
>>
>> Steve
>>
>> On Sat, Sep 17, 2011 at 4:39 PM, Gabriel Michael Black
>> <[email protected] <mailto:[email protected]>> wrote:
>>
>>     Hi Meredydd. I'd say this isn't a bug, perse, but it is wrong.
>>     Basically the support for locking memory operations is incomplete.
>>
>>     The way this is supposed to work is that a load with the LOCKED
>>     flag set will lock a chunk of memory, and then a subsequent store
>>     with the LOCKED flag set will unlock it. All stores with LOCKED
>>     set must be preceded by a load with that set. You could think of
>>     the load as acquiring a mutex and the store as releasing it.
>>
>>     In atomic mode, because gem5 is single threaded and because
>>     atomic memory accesses complete immediately, the only thing you
>>     need to do to make sure locked memory accesses aren't interrupted
>>     by anything is to make sure the cpu keeps control until the
>>     locked section is complete. To do that we just keep track of
>>     whether or not we've executed a locked load and don't stop
>>     executing instructions until we see a locked store. This is what
>>     you're seeing in the atomic mode CPU.
>>
>>     In timing mode, which is what all other CPUs use including the
>>     timing simple CPU, something more complex is needed because
>>     memory accesses take "time" and other things can happen while the
>>     CPU waits for a response. In that case, the locking would have to
>>     actually happen in the memory system and the various components
>>     (caches, memory, or something else) would have to keep track of
>>     what areas of memory (if any) are currently locked. This is the
>>     part that isn't yet implemented.
>>
>>     So in summary, yes it is known to not work properly, but I
>>     wouldn't call it a bug, I'd say that it's just not finished yet.
>>
>>     Gabe
>>
>>
>>     Quoting Meredydd Luff <[email protected]
>>     <mailto:[email protected]>>:
>>
>>         It appears that the CAS (LOCK; CMPXCHGx) instruction doesn't
>>         do what
>>         it says on the tin, at least using the O3 model and X86_SE.
>>         When I run
>>         the following code (inside a container that runs this code
>>         once on
>>         each of four processors):
>>
>>            volatile unsigned long x;
>>            [...]
>>            for(a=0; a<1000; a++) {
>>                while(lastx = *x, oldx = cas(x, lastx, lastx+1), oldx
>>         != lastx);
>>
>>
>>         ...I get final x values of 1200 or so (rather than 4000, as would
>>         happen if the compare-and-swap were atomic). This is using the
>>         standard se.py, and a fresh checkout of the gem5 repository - my
>>         command line is:
>>         build/X86_SE/m5.opt configs/example/se.py -d --caches -n 4 -c
>>         /path/to/my/binary
>>
>>
>>         Is this a known bug? Looking at the x86 microcode, it appears
>>         that the
>>         relevant microops are ldstl and stul. Their only difference
>>         from what
>>         appears to be their unlocked equivalents (ldst and st) is the
>>         addition
>>         of the Request::LOCKED flag. A quick grep indicates that that
>>         LOCKED
>>         flag is only accessed by the Request::isLocked() accessor
>>         function,
>>         and that isLocked() is not referenced anywhere except twice in
>>         cpu/simple/atomic.cc.
>>
>>         Unless I'm missing something, it appears that atomic memory
>>         accesses
>>         are simply not implemented. Is this true?
>>
>>         Meredydd
>>
>>
>>         PS - This is the CAS I'm using:
>>
>>         static inline unsigned long cas(volatile unsigned long* ptr,
>>         unsigned
>>         long old, unsigned long _new)
>>         {
>>            unsigned long prev;
>>            asm volatile("lock;"
>>                         "cmpxchgq %1, %2;"
>>                         : "=a"(prev)
>>                         : "q"(_new), "m"(*ptr), "0"(old)
>>                         : "memory");
>>            return prev;
>>         }
>>
>>
>>
>>         PPS - I searched around this issue, and the only relevant thing I
>>         found was a mailing list post from last year, indicating that
>>         ldstl
>>         and stul were working for someone (no indication that was
>>         using O3,
>>         though):
>>         http://www.mail-archive.com/[email protected]/msg07297.html
>>         This would indicate that at least one CPU model does support
>>         atomicity
>>         - but even looking in atomic.cc, I can't immediately see why that
>>         would work!
>>
>>         There is some code for handling a flag called
>>         Request::MEM_SWAP_COND/isCondSwap(), but it appears to be
>>         generated
>>         only by the SPARC ISA, and examined only by the simple timing and
>>         atomic models.
>>         _______________________________________________
>>         gem5-users mailing list
>>         [email protected] <mailto:[email protected]>
>>         http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>>     _______________________________________________
>>     gem5-users mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Reply via email to