Oh, also, POWER is the same as MIPS. There's no FS build, so all FS entries should be red.
Gabe On 09/18/11 18:59, Gabe Black wrote: > I think X86 is excessively red. Timing simple CPU should be yellow for > SE and FS uni and multiprocessor, except for the small exception that > atomic accesses aren't atomic. The implementation is always going to > be incomplete because X86 is so large and frequently ambiguous and > because only a fraction of it is actually useful. Marking it as > "definitely does not work" is a bit draconian. O3 support in SE should > be the same, and I'd say O3 in FS should be orange. > > On the other hand, MIPS is overly not red. There is no MIPS_FS target > because one wouldn't compile as of today, so everything FS should be red. > > I don't know the status of Ruby on anything so I can't comment on those. > > Gabe > > On 09/18/11 18:17, Steve Reinhardt wrote: >> Yea, whether you call it a bug or an unimplemented feature, it still >> doesn't work... it's definitely a bug that that's not documented >> though. I updated the status matrix to reflect this problem: >> http://gem5.org/Status_Matrix >> >> (I also did a bunch of general editing on the status matrix too... >> Gabe, you may want to check it out and see what you think.) >> >> Note that this is a problem only in the "classic" m5 cache models; >> Ruby does support x86 locking. However, Ruby doesn't support O3 LSQ >> probes to enforce stronger consistency models, so this gets you >> TimingSimple CPUs but not O3 CPUs. >> >> Adding locked RMW access to the classic caches is doable, but not >> completely trivial... basically if a snoop (probe) arrives that would >> downgrade access to a locked block, that snoop has to be deferred and >> processed after the lock is released. There's already support in the >> protocol for deferring snoops that hit on an MSHR, but the details of >> how that's extended to handle locked blocks are TBD. I expect the >> solution involves either (1) adding a bit to the tags to mark locked >> blocks or (2) allocating an MSHR or MSHR-like structure for each >> locked block. There are pros and cons to each. I don't have time to >> implement this myself, but if someone else wants to take a crack, I'd >> be glad to consult. >> >> Do we have an issue in O3 with speculatively issuing a the read part >> of a locked RMW (which locks the block) but then, due to a squash, >> not issuing the write that unlocks it? That seems like a tricky >> bit... I don't know if Ruby handles this or not. >> >> Steve >> >> On Sat, Sep 17, 2011 at 4:39 PM, Gabriel Michael Black >> <[email protected] <mailto:[email protected]>> wrote: >> >> Hi Meredydd. I'd say this isn't a bug, perse, but it is wrong. >> Basically the support for locking memory operations is incomplete. >> >> The way this is supposed to work is that a load with the LOCKED >> flag set will lock a chunk of memory, and then a subsequent store >> with the LOCKED flag set will unlock it. All stores with LOCKED >> set must be preceded by a load with that set. You could think of >> the load as acquiring a mutex and the store as releasing it. >> >> In atomic mode, because gem5 is single threaded and because >> atomic memory accesses complete immediately, the only thing you >> need to do to make sure locked memory accesses aren't interrupted >> by anything is to make sure the cpu keeps control until the >> locked section is complete. To do that we just keep track of >> whether or not we've executed a locked load and don't stop >> executing instructions until we see a locked store. This is what >> you're seeing in the atomic mode CPU. >> >> In timing mode, which is what all other CPUs use including the >> timing simple CPU, something more complex is needed because >> memory accesses take "time" and other things can happen while the >> CPU waits for a response. In that case, the locking would have to >> actually happen in the memory system and the various components >> (caches, memory, or something else) would have to keep track of >> what areas of memory (if any) are currently locked. This is the >> part that isn't yet implemented. >> >> So in summary, yes it is known to not work properly, but I >> wouldn't call it a bug, I'd say that it's just not finished yet. >> >> Gabe >> >> >> Quoting Meredydd Luff <[email protected] >> <mailto:[email protected]>>: >> >> It appears that the CAS (LOCK; CMPXCHGx) instruction doesn't >> do what >> it says on the tin, at least using the O3 model and X86_SE. >> When I run >> the following code (inside a container that runs this code >> once on >> each of four processors): >> >> volatile unsigned long x; >> [...] >> for(a=0; a<1000; a++) { >> while(lastx = *x, oldx = cas(x, lastx, lastx+1), oldx >> != lastx); >> >> >> ...I get final x values of 1200 or so (rather than 4000, as would >> happen if the compare-and-swap were atomic). This is using the >> standard se.py, and a fresh checkout of the gem5 repository - my >> command line is: >> build/X86_SE/m5.opt configs/example/se.py -d --caches -n 4 -c >> /path/to/my/binary >> >> >> Is this a known bug? Looking at the x86 microcode, it appears >> that the >> relevant microops are ldstl and stul. Their only difference >> from what >> appears to be their unlocked equivalents (ldst and st) is the >> addition >> of the Request::LOCKED flag. A quick grep indicates that that >> LOCKED >> flag is only accessed by the Request::isLocked() accessor >> function, >> and that isLocked() is not referenced anywhere except twice in >> cpu/simple/atomic.cc. >> >> Unless I'm missing something, it appears that atomic memory >> accesses >> are simply not implemented. Is this true? >> >> Meredydd >> >> >> PS - This is the CAS I'm using: >> >> static inline unsigned long cas(volatile unsigned long* ptr, >> unsigned >> long old, unsigned long _new) >> { >> unsigned long prev; >> asm volatile("lock;" >> "cmpxchgq %1, %2;" >> : "=a"(prev) >> : "q"(_new), "m"(*ptr), "0"(old) >> : "memory"); >> return prev; >> } >> >> >> >> PPS - I searched around this issue, and the only relevant thing I >> found was a mailing list post from last year, indicating that >> ldstl >> and stul were working for someone (no indication that was >> using O3, >> though): >> http://www.mail-archive.com/[email protected]/msg07297.html >> This would indicate that at least one CPU model does support >> atomicity >> - but even looking in atomic.cc, I can't immediately see why that >> would work! >> >> There is some code for handling a flag called >> Request::MEM_SWAP_COND/isCondSwap(), but it appears to be >> generated >> only by the SPARC ISA, and examined only by the simple timing and >> atomic models. >> _______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] <mailto:[email protected]> >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
