Hi Meredydd. I'd say this isn't a bug, perse, but it is wrong.
Basically the support for locking memory operations is incomplete.
The way this is supposed to work is that a load with the LOCKED flag
set will lock a chunk of memory, and then a subsequent store with the
LOCKED flag set will unlock it. All stores with LOCKED set must be
preceded by a load with that set. You could think of the load as
acquiring a mutex and the store as releasing it.
In atomic mode, because gem5 is single threaded and because atomic
memory accesses complete immediately, the only thing you need to do to
make sure locked memory accesses aren't interrupted by anything is to
make sure the cpu keeps control until the locked section is complete.
To do that we just keep track of whether or not we've executed a
locked load and don't stop executing instructions until we see a
locked store. This is what you're seeing in the atomic mode CPU.
In timing mode, which is what all other CPUs use including the timing
simple CPU, something more complex is needed because memory accesses
take "time" and other things can happen while the CPU waits for a
response. In that case, the locking would have to actually happen in
the memory system and the various components (caches, memory, or
something else) would have to keep track of what areas of memory (if
any) are currently locked. This is the part that isn't yet implemented.
So in summary, yes it is known to not work properly, but I wouldn't
call it a bug, I'd say that it's just not finished yet.
Gabe
Quoting Meredydd Luff <[email protected]>:
It appears that the CAS (LOCK; CMPXCHGx) instruction doesn't do what
it says on the tin, at least using the O3 model and X86_SE. When I run
the following code (inside a container that runs this code once on
each of four processors):
volatile unsigned long x;
[...]
for(a=0; a<1000; a++) {
while(lastx = *x, oldx = cas(x, lastx, lastx+1), oldx != lastx);
...I get final x values of 1200 or so (rather than 4000, as would
happen if the compare-and-swap were atomic). This is using the
standard se.py, and a fresh checkout of the gem5 repository - my
command line is:
build/X86_SE/m5.opt configs/example/se.py -d --caches -n 4 -c
/path/to/my/binary
Is this a known bug? Looking at the x86 microcode, it appears that the
relevant microops are ldstl and stul. Their only difference from what
appears to be their unlocked equivalents (ldst and st) is the addition
of the Request::LOCKED flag. A quick grep indicates that that LOCKED
flag is only accessed by the Request::isLocked() accessor function,
and that isLocked() is not referenced anywhere except twice in
cpu/simple/atomic.cc.
Unless I'm missing something, it appears that atomic memory accesses
are simply not implemented. Is this true?
Meredydd
PS - This is the CAS I'm using:
static inline unsigned long cas(volatile unsigned long* ptr, unsigned
long old, unsigned long _new)
{
unsigned long prev;
asm volatile("lock;"
"cmpxchgq %1, %2;"
: "=a"(prev)
: "q"(_new), "m"(*ptr), "0"(old)
: "memory");
return prev;
}
PPS - I searched around this issue, and the only relevant thing I
found was a mailing list post from last year, indicating that ldstl
and stul were working for someone (no indication that was using O3,
though): http://www.mail-archive.com/[email protected]/msg07297.html
This would indicate that at least one CPU model does support atomicity
- but even looking in atomic.cc, I can't immediately see why that
would work!
There is some code for handling a flag called
Request::MEM_SWAP_COND/isCondSwap(), but it appears to be generated
only by the SPARC ISA, and examined only by the simple timing and
atomic models.
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users