Hi all, I've been working on integrating McVerSi [1] (a framework for simulation-based memory consistency verification) into the latest Gem5 version; in doing so I have also discovered new bugs (see end of email).
At [1] I provide a patch and instructions on how to use McVerSi with the current Gem5 version. There is a chance that (due to differing checkpoints, timing, etc.) other bugs may be discovered than the mentioned ones here. Ultimately, it would be good to have this in upstream, but currently doesn't appear clear to me what the best approach for this is. It was raised at HPCA (during discussion with Brad and David), that it might be nice to have for regression testing. Comments would be appreciated. The patch at [1] needs some work to be suitable to be accepted into upstream; does it still make sense to upload to RB to discuss? I am also unsure if Gem5 currently provides a clean way to implement the hooks required, and would rather not need to change other subsystems. (A small portion of the patch at [1] is already on the RB: http://reviews.gem5.org/r/3398/) Furthermore, it requires a substantial external component (https://github.com/melver/mc2lib) to live in ext/; should this be put onto RB at the same time? Many thanks, -- Marco Elver Research Associate | University of Edinburgh | ac.marcoelver.com [1] http://ac.marcoelver.com/research/mcversi The following are new bugs found; neither look particularly easy to fix: 1. In the classic memory system, same address read->read reordering (SC PER LOCATION violation) can occur with O3CPU (found while using ARM ISA, but probably affects all other ISAs requiring at least same address read->read ordering): Consider the case of a replacement (Cache::allocateBlock), which chooses a block corresponding to a prior request that has already been satisfied, i.e. no longer in MSHR. However, the request is still not committed in the LQ on CPU side, and if we don't send an invalidation to the CPU side on a replacement we may miss any potential external invalidations after. This bug is found after running for about 30 min. 2. In O3CPU LSQ, write->read reordering (SC PER LOCATION violation) can occur in the same thread when using virtual address synonyms (found with X86+Ruby, but will affect all other configurations that use the O3CPU): When forwarding in-flight writes from the SQ, only virtual addresses are checked; if two virtual addresses map to the same physical address, no forwarding takes place. I realize this might be a rare use-case for the kinds of workloads being run on Gem5 right now? Found with first generated test, i.e. instantly. -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev