The easiest way I've found is to isolate the address that's causing the problem, then re-run with cache and bus tracing on and grep out the accesses to that cache block. Sometimes that shows the problem directly, and if not at the very least that should highlight when things start to go awry and give you cycle counts where you can set breakpoints for gdb.
If you can't easily deduce the problem address from the application (e.g., if you're stuck waiting for a lock that never gets released, what's the address of the lock), you should try the memory tester; if it encounters a coherence problem it will print the address out. (I suggest trying the memory tester anyway as it's much nicer to have it tell you about a coherence problem than to deduce that there's a problem from application misbehavior.) There's also a primitive facility for printing out the global state of a cache block that's discussed on pp. 147-148 of our latest tutorial. (I thought it was described on the wiki too but I can't find it there.) Steve On Thu, Feb 12, 2009 at 12:07 AM, Rick Strong <[email protected]> wrote: > Hi all, > > Recent experience with the mesh+directory coherence patch and the > parsec parallel benchmarks has been a coherency nightmare. DMA sometimes > has weird errors, certain functions get stuck waiting for cache accesses > and other nightmare-ish scenarios. > > I am wondering if anyone knows of any good ways of debugging memory > coherency problems for directory coherence + mesh. > > I have two ideas: > > (1) Dump all the accesses out to a trace and then look for the time of > each cache request and make sure they are all satisfied (you could also > use a global system cache-coherency class in the M5 simulator to do > this verficiation). This makes sure the system isn't getting hung up on > a memory access. In addition, you look at the access time ranges for a > given cache line and look for intersections in the ranges for two > different cpus. This would flag possible coherency scenarios (although > the false positive rate would be high for detecting a problem). The more > difficult case is figuring out if a cache entry is modified in a local > cache, a remote cache does a read and for some reason it gets its copy > from somewhere else besides the local cache. > > (2) Make a visual display for cache accesses between the various levels > of the caches like in wireshark flow-model for tcp with time > (picoseconds, nanoseconds) on the y-axis and different caches on the x > axis. > > From my perspective, option 2 is easy but still requires visual > verification on my part. Option 1 is more difficult but is automated. > Are there any other methods for verifying a coherency protocol? > > Also, are there any other models for directory coherence available? I > wouldn't mind if it is an inaccurate hack that roughly mimicks directory > coherence. > > Best, > -Rick > > > _______________________________________________ > m5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users >
_______________________________________________ m5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
