All,

I am writing to discuss a potential bug in gem5 related to snoop response 
behavior in classic memory mode. Please let me know if I am overlooking 
something. The bug is related to a snoop request pointer being deleted 
prematurely before being handled by all cores in the system, thereby triggering 
assertions related to req pointer being null.

I have implemented a D-NUCA system in classic with clusters - a few cores tied 
to one L2, other cores tied to a separate L2, and a D-NUCA system at the L3 
level instead of a traditional crossbar. What this does is breaks the "instant" 
behavior of snoop responses from one L2 to the next on a crossbar, i.e., a 
snoop response from a core in cluster A will take some time to traverse the 
DNUCA level and end up back in a separate cluster (let's call it cluster B).

So, in a cluster like implementation where I have multiple L1's in a single 
cluster, here is the scenario that is causing a problem:

Core 0, 1, 2, 3 are in one cluster, each with an L1, and sharing an L2. Core 4 
is in a separate cluster, with its own L2, and the connections between clusters 
are a DNUCA L3.

Core 4 issues a read request which hits a dirty line (M state) in Core1. A 
response is initiated, and dirty data is now in flight.
Almost immediately after, Core 0 issues a snoop (upgrade request), and it hits 
the dirty line (now O state) in Core 1. Core 1 initiates the response to be 
sent back to Core 0, but being an upgrade, it gets broadcasted to all cores. 
This upgrade request beats the snoop response to Core 4, and appends in to Core 
4's L2 MSHR for that line.

Further, Core 1's upgrade response gets handled in Core 0, but this happens 
BEFORE Core 4 gets it's data response, as the snoop response data is taking 
time traversing a DNUCA L3 back into Core 4's cluster. When Core 0 receives its 
upgrade response, it deletes the snoop req. But this snoop req has already been 
appended in Core 4's cluster, so when Core 4 finally gets its snoop data and 
handles the deferred snoop, an assertion fires because the req pointer for that 
appended snoop no longer exists.

Fundamentally, in a clustered system with DNUCA, what is to prevent 
inter-cluster snoops from getting resolved before intra-cluster snoops? How can 
we ensure that req pointers aren't deleted until ALL potential appended MSHR's 
have been handled?

Cliff Maxwell


_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to