[gem5-dev] Invalid DMA Transition in MOESI_AMD_Base-dir for GPU Pannotia Benchmarks

Matt Sinclair Sat, 16 Nov 2019 08:47:56 -0800

 (Resending as my email got bounced from the gem5 mailing list)

Hi Brad & Tony,

Gaurav (CC'd) and I have been attempting to run the Pannotia benchmarks on
the amd-gcn3-staging branch. We've hipified all of them, and tested that
they all work on "real" GPUs, then removed the copies before running them
in the simulator. However, we're having a problem with several of the
benchmarks (FloydWarshall, Color) where they are getting Invalid DMA
Transition deadlocks in the MOESI_AMD_Base-dir protocol. We've gotten a
trace and done a bunch of digging, and identified the following pattern as
causing the deadlock:

*tl;dr: There is a deadlock in the MOESI_AMD_Base-dir protocol that relates
to a race between a CPU load and a DMA load. Is there documentation of how
the DMA should behave? Shouldn't the directory always wake up any pending
requests when the thing it was pending on completes?*

1. DMA issues a read for Block A
2. Directory receives 1, initiates Invalidation probe + sends a read to
memory because the data is currently not in any of the caches.
3. The CPU (CorePair) issues a read for Block A (before the invalidation
is received from 2) since it misses. CPU transitions from I --> I_EOS.
4. The directory receives 3, but because it is in the middle of BDR_PM,
stalls the CPU's read request and puts it on the stall buffer --
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/src/mem/protocol/MOESI_AMD_Base-dir.sm#1086
.
5. CPU receives Invalidation request, acknowledges but stays in I_EOS
- this strikes me as strange, wouldn't we want to go to I? Or at least
to another transient state that will transition to I when everything is
done? But this is not the key issue here ...
6. Directory receives memory response, transitions from BDR_PM --> BDR_Pm
7. (GPU/other CPU receive invalidation and respond appropriately)
8. Remaining invalidations received by Directory, it transitions from
BDR_Pm --> U.
- This strikes me as problematic, and I believe is what the source of
the deadlock is. Shouldn't we wake up the things on the stallbuffer now?
Since the request before it just completed? In any event, that is not what
happens right now (
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/src/mem/protocol/MOESI_AMD_Base-dir.sm#1332),
which leads to a deadlock.

There are other cases (e.g., when 1 and 3 are inverted), where the
stallbuffer will be woken up (I believe it's this one:
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/src/mem/protocol/MOESI_AMD_Base-dir.sm#1169,
but Gaurav can correct me if I grabbed the wrong one).

We tried making the "obvious" changes:

- On BDR_Pm --> U transition, call . We added it before deallocating the
TBE in this transition (
https://gem5.googlesource.com/amd/gem5/+/refs/heads/agutierr/master-gcn3-staging/src/mem/protocol/MOESI_AMD_Base-dir.sm#1334),
because that's what the transition linked above did on 1169-1172).
- This lead to an Invalid Transition for the next DMA request on the same
address, because the B_PM state doesn't have a transition for when a DMA
request arrives. So we added that in, putting the DMA request in the stall
buffer just like the other case did for the CPU read. But then we get an
assert failure in the message buffer -- isReady() in stallMessage (which
gets called from st_stallAndWaitRequest) fails because the message is
enqueued later.

When we got to this point, I wasn't sure of what the next step should be.
So, we were wondering:

- Is this a bug you are aware of internally?
- Do you know why the stall buffer would not be woken up in step 8 above?
- Do you have any publicly available documentation about the DMA requests,
and/or how this situation should be handled?
- Do you see anything wrong with the logic of the changes above? Since
clearly making them did not immediately solve the problem.
- I did notice that the mainline AMD_MOESI_Base-dir.sm file does not have
these DMA transitions, but I'm guessing the intent is for your branch to
eventually take that files place, and not vice-versa where it has a fix you
don't have on your staging branch?

Any help you could provide would be greatly appreciated!

Regards,
Matt Sinclair
Assistant Professor
University of Wisconsin-Madison
Computer Sciences Department
cs.wisc.edu/~sinclair
_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

[gem5-dev] Invalid DMA Transition in MOESI_AMD_Base-dir for GPU Pannotia Benchmarks

Reply via email to