On 16.07.25 14:51, Tvrtko Ursulin wrote: >>>>>> be disabled once GFX/SDMA is no longer active. In this particular >>>>>> case there was a race condition somewhere in the internal handshaking >>>>>> with SDMA which led to SDMA missing doorbells sometimes and not >>>>>> executing the job even if there was work in the ring. >>>>> >>>>> Thank you, more or less than what I assumed. >>>>> >>>>> But in this case there should be no harm in holding GFXOFF disabled >>>>> until the job completes (like this patch)? Only a win to avoid the SMU >>>>> communication latencies while unit is powered on anyway. >>>> >>>> The extra latency is only on the CPU side, once the >>>> amdgpu_ring_commit() is called the SDMA engine is already working. >>> >>> It is on the CPU side but can create bubbles in the pipeline, no? Is >>> there no scope with AMD to have GFX and SDMA jobs depend on each other? >>> Because, as said, I've seen some high latencies from the GFXOFF disable >>> calls. >> >> The SDMA job is already executing at that point. The allow gfxoff >> message to the firmware shouldn't come until later because it's >> handled by a delayed work thread from end_use(). If you have multiple >> submissions to SDMA within the delay window, the begin_use() and >> end_use() will just be ref count handling and won't actually talk to >> the firmware. > > I followed up with testing a bunch more games, and is it turns out, Cyberpunk > 2077 is the only one which has this submission patterns where default > GFX_OFF_DELAY_ENABLE is regularly defeated. > > There, around 1.2 times per second the SDMA submissions miss that 100ms > hysteresis and cause a CPU latency over 100us (I only measured when >100us > and ignored the rest). Average latency is ~400us and max is ~2ms. So IMHO > quite bad.
What exactly does Cyberpunk do to hit that? Are those SDMA page table updates, clears or userspace submissions? > > And the vast majority of those latencies come from the SMU request. Only very > rarely someone hits the mutex contention path. > > So that was the motivation for the RFC. I suppose I could have also proposed > to increase the hysteresis, but holding the GFXOFF disabled for the duration > of the job sounded preferable for power consmuption. > > Anyway, given I only found Cyberpunk 2077 suffers from this I guess it maybe > isn't to interesting to upstream for you guys. Then again it is limited to > specific old SKU so maybe it should not be that controversial either? Only > that Christian NAKed tying it to job lifetime. So I don't know, AMDs call. Well what you could do is to take a look if we couldn't simplify the SMU and/or adjust the GFX_OFF_DELAY_ENABLED. On the other hand why does it help to keep GFXOFF disabled while running the SDMA job? Regards, Christian. > > Regards, > > Tvrtko >