All, I am writing to discuss a potential bug in gem5 related to snoop response behavior in classic memory mode. Please let me know if I am overlooking something. The bug is related to a snoop request pointer being deleted prematurely before being handled by all cores in the system, thereby triggering assertions related to req pointer being null.
I have implemented a D-NUCA system in classic with clusters - a few cores tied to one L2, other cores tied to a separate L2, and a D-NUCA system at the L3 level instead of a traditional crossbar. What this does is breaks the "instant" behavior of snoop responses from one L2 to the next on a crossbar, i.e., a snoop response from a core in cluster A will take some time to traverse the DNUCA level and end up back in a separate cluster (let's call it cluster B). So, in a cluster like implementation where I have multiple L1's in a single cluster, here is the scenario that is causing a problem: Core 0, 1, 2, 3 are in one cluster, each with an L1, and sharing an L2. Core 4 is in a separate cluster, with its own L2, and the connections between clusters are a DNUCA L3. Core 4 issues a read request which hits a dirty line (M state) in Core1. A response is initiated, and dirty data is now in flight. Almost immediately after, Core 0 issues a snoop (upgrade request), and it hits the dirty line (now O state) in Core 1. Core 1 initiates the response to be sent back to Core 0, but being an upgrade, it gets broadcasted to all cores. This upgrade request beats the snoop response to Core 4, and appends in to Core 4's L2 MSHR for that line. Further, Core 1's upgrade response gets handled in Core 0, but this happens BEFORE Core 4 gets it's data response, as the snoop response data is taking time traversing a DNUCA L3 back into Core 4's cluster. When Core 0 receives its upgrade response, it deletes the snoop req. But this snoop req has already been appended in Core 4's cluster, so when Core 4 finally gets its snoop data and handles the deferred snoop, an assertion fires because the req pointer for that appended snoop no longer exists. Fundamentally, in a clustered system with DNUCA, what is to prevent inter-cluster snoops from getting resolved before intra-cluster snoops? How can we ensure that req pointers aren't deleted until ALL potential appended MSHR's have been handled? Cliff Maxwell _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
