Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Gabe Black Sun, 18 Sep 2011 20:07:30 -0700

There's a heck of a lot more to an ISA implementation than one
primitive. By making a blanket statement that it doesn't work at all,
you negate all the things it does do perfectly correctly and has done
correctly for years. The fact that no one has bothered to fix the memory
system is not a defect of x86, and the fact that some instructions may
not work in all situations because of it does not mean that x86 is in
the same category as functionality that doesn't even compile.


Gabe

On 09/18/11 19:53, Steve Reinhardt wrote:
> I'd say that if atomic accesses aren't atomic, then multiprocessor
> systems do not work, and that's not a particularly small exception.  I
> suppose you could still run multiprogrammed single-thread workloads in
> SE mode, but that's a small exception to it not working, not the other
> way around.  Uniprocessor FS is also very suspect since you can still
> have atomicity violations with device accesses.  I think only
> uniprocessor SE mode can be considered working if atomic accesses
> aren't atomic.
>
> This is not really related to completeness; it's a fundamental
> operation that is implemented but is not guaranteed to give the
> correct answer.  The same thing goes for not getting the consistency
> model right.
>
> I didn't touch the MIPS table at all, so if it's out of date, go ahead
> and update it.
>
> Steve
>
> On Sun, Sep 18, 2011 at 6:59 PM, Gabe Black <[email protected]
> <mailto:[email protected]>> wrote:
>
>     I think X86 is excessively red. Timing simple CPU should be yellow
>     for SE and FS uni and multiprocessor, except for the small
>     exception that atomic accesses aren't atomic. The implementation
>     is always going to be incomplete because X86 is so large and
>     frequently ambiguous and because only a fraction of it is actually
>     useful. Marking it as "definitely does not work" is a bit
>     draconian. O3 support in SE should be the same, and I'd say O3 in
>     FS should be orange.
>
>     On the other hand, MIPS is overly not red. There is no MIPS_FS
>     target because one wouldn't compile as of today, so everything FS
>     should be red.
>
>     I don't know the status of Ruby on anything so I can't comment on
>     those.
>
>     Gabe
>
>
>     On 09/18/11 18:17, Steve Reinhardt wrote:
>>     Yea, whether you call it a bug or an unimplemented feature, it
>>     still doesn't work... it's definitely a bug that that's not
>>     documented though.  I updated the status matrix to reflect this
>>     problem:
>>     http://gem5.org/Status_Matrix
>>
>>     (I also did a bunch of general editing on the status matrix
>>     too... Gabe, you may want to check it out and see what you think.)
>>
>>     Note that this is a problem only in the "classic" m5 cache
>>     models; Ruby does support x86 locking.  However, Ruby doesn't
>>     support O3 LSQ probes to enforce stronger consistency models, so
>>     this gets you TimingSimple CPUs but not O3 CPUs.
>>
>>     Adding locked RMW access to the classic caches is doable, but not
>>     completely trivial... basically if a snoop (probe) arrives that
>>     would downgrade access to a locked block, that snoop has to be
>>     deferred and processed after the lock is released.  There's
>>     already support in the protocol for deferring snoops that hit on
>>     an MSHR, but the details of how that's extended to handle locked
>>     blocks are TBD.  I expect the solution involves either (1) adding
>>     a bit to the tags to mark locked blocks or (2) allocating an MSHR
>>     or MSHR-like structure for each locked block.  There are pros and
>>     cons to each.  I don't have time to implement this myself, but if
>>     someone else wants to take a crack, I'd be glad to consult.
>>
>>     Do we have an issue in O3 with speculatively issuing a the read
>>     part of a locked RMW (which locks the block) but then, due to a
>>     squash, not issuing the write that unlocks it?  That seems like a
>>     tricky bit... I don't know if Ruby handles this or not.
>>
>>     Steve
>>
>>     On Sat, Sep 17, 2011 at 4:39 PM, Gabriel Michael Black
>>     <[email protected] <mailto:[email protected]>> wrote:
>>
>>         Hi Meredydd. I'd say this isn't a bug, perse, but it is
>>         wrong. Basically the support for locking memory operations is
>>         incomplete.
>>
>>         The way this is supposed to work is that a load with the
>>         LOCKED flag set will lock a chunk of memory, and then a
>>         subsequent store with the LOCKED flag set will unlock it. All
>>         stores with LOCKED set must be preceded by a load with that
>>         set. You could think of the load as acquiring a mutex and the
>>         store as releasing it.
>>
>>         In atomic mode, because gem5 is single threaded and because
>>         atomic memory accesses complete immediately, the only thing
>>         you need to do to make sure locked memory accesses aren't
>>         interrupted by anything is to make sure the cpu keeps control
>>         until the locked section is complete. To do that we just keep
>>         track of whether or not we've executed a locked load and
>>         don't stop executing instructions until we see a locked
>>         store. This is what you're seeing in the atomic mode CPU.
>>
>>         In timing mode, which is what all other CPUs use including
>>         the timing simple CPU, something more complex is needed
>>         because memory accesses take "time" and other things can
>>         happen while the CPU waits for a response. In that case, the
>>         locking would have to actually happen in the memory system
>>         and the various components (caches, memory, or something
>>         else) would have to keep track of what areas of memory (if
>>         any) are currently locked. This is the part that isn't yet
>>         implemented.
>>
>>         So in summary, yes it is known to not work properly, but I
>>         wouldn't call it a bug, I'd say that it's just not finished yet.
>>
>>         Gabe
>>
>>
>>         Quoting Meredydd Luff <[email protected]
>>         <mailto:[email protected]>>:
>>
>>             It appears that the CAS (LOCK; CMPXCHGx) instruction
>>             doesn't do what
>>             it says on the tin, at least using the O3 model and
>>             X86_SE. When I run
>>             the following code (inside a container that runs this
>>             code once on
>>             each of four processors):
>>
>>                volatile unsigned long x;
>>                [...]
>>                for(a=0; a<1000; a++) {
>>                    while(lastx = *x, oldx = cas(x, lastx, lastx+1),
>>             oldx != lastx);
>>
>>
>>             ...I get final x values of 1200 or so (rather than 4000,
>>             as would
>>             happen if the compare-and-swap were atomic). This is
>>             using the
>>             standard se.py, and a fresh checkout of the gem5
>>             repository - my
>>             command line is:
>>             build/X86_SE/m5.opt configs/example/se.py -d --caches -n
>>             4 -c /path/to/my/binary
>>
>>
>>             Is this a known bug? Looking at the x86 microcode, it
>>             appears that the
>>             relevant microops are ldstl and stul. Their only
>>             difference from what
>>             appears to be their unlocked equivalents (ldst and st) is
>>             the addition
>>             of the Request::LOCKED flag. A quick grep indicates that
>>             that LOCKED
>>             flag is only accessed by the Request::isLocked() accessor
>>             function,
>>             and that isLocked() is not referenced anywhere except
>>             twice in
>>             cpu/simple/atomic.cc.
>>
>>             Unless I'm missing something, it appears that atomic
>>             memory accesses
>>             are simply not implemented. Is this true?
>>
>>             Meredydd
>>
>>
>>             PS - This is the CAS I'm using:
>>
>>             static inline unsigned long cas(volatile unsigned long*
>>             ptr, unsigned
>>             long old, unsigned long _new)
>>             {
>>                unsigned long prev;
>>                asm volatile("lock;"
>>                             "cmpxchgq %1, %2;"
>>                             : "=a"(prev)
>>                             : "q"(_new), "m"(*ptr), "0"(old)
>>                             : "memory");
>>                return prev;
>>             }
>>
>>
>>
>>             PPS - I searched around this issue, and the only relevant
>>             thing I
>>             found was a mailing list post from last year, indicating
>>             that ldstl
>>             and stul were working for someone (no indication that was
>>             using O3,
>>             though):
>>             http://www.mail-archive.com/[email protected]/msg07297.html
>>             This would indicate that at least one CPU model does
>>             support atomicity
>>             - but even looking in atomic.cc, I can't immediately see
>>             why that
>>             would work!
>>
>>             There is some code for handling a flag called
>>             Request::MEM_SWAP_COND/isCondSwap(), but it appears to be
>>             generated
>>             only by the SPARC ISA, and examined only by the simple
>>             timing and
>>             atomic models.
>>             _______________________________________________
>>             gem5-users mailing list
>>             [email protected] <mailto:[email protected]>
>>             http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>>         _______________________________________________
>>         gem5-users mailing list
>>         [email protected] <mailto:[email protected]>
>>         http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>>
>>     _______________________________________________
>>     gem5-users mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>     _______________________________________________
>     gem5-users mailing list
>     [email protected] <mailto:[email protected]>
>     http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] O3 compare-and-swap appears not to be atomic

Reply via email to