Hi Jae-eon, The catch is that x86 locked operations are different from the existing atomics. The original m5 cache just supported LL/SC for Alpha, where of course there is no actual locking occurring. When we added SPARC we added the handful of specific atomic operations that SPARC supports, such as compare-and-swap. Unfortunately x86 doesn't have just a handful of atomic operations; you can make any memory-memory operation atomic by adding a lock prefix. This makes doing the atomic operation in the cache pretty much impractical. Instead, x86 uses the LOCKED flag to delineate a load/store pair that needs to be treated as an atomic RMW. Because the old value is sent to the core and the core sends the new value back, the core itself can implement any atomic operation it wants.
The good news is that these load/store pairs always occur on two microops within the scope of a single macroinstruction, so you're always guaranteed to see the store shortly after you see the load. The atomic CPU implements this pretty easily because it doesn't care about timing and memory accesses are atomic, so it just completes the entire RMW sequence within one simulation event (one call to tick()). Unfortunately in timing mode, that isn't feasible. So what needs to happen is that, once a load with the LOCKED bit set is handled in timing mode, the cache needs to mark that block and defer any invalidations to that block until after the corresponding store has completed. I haven't looked at the code that closely, but I don't think that should be terribly hard. One complication is the case where you have a shared cache (most typically this would be a multithreaded CPU hooked up to a single L1); you want to make sure that locked accesses from the different threads don't conflict, so you'd need to record the source of the LOCKED load request and make sure that accesses from other sources are deferred as well. Steve _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
