On 16.07.25 14:51, Tvrtko Ursulin wrote:
>>>>>> be disabled once GFX/SDMA is no longer active.  In this particular
>>>>>> case there was a race condition somewhere in the internal handshaking
>>>>>> with SDMA which led to SDMA missing doorbells sometimes and not
>>>>>> executing the job even if there was work in the ring.
>>>>>
>>>>> Thank you, more or less than what I assumed.
>>>>>
>>>>> But in this case there should be no harm in holding GFXOFF disabled
>>>>> until the job completes (like this patch)? Only a win to avoid the SMU
>>>>> communication latencies while unit is powered on anyway.
>>>>
>>>> The extra latency is only on the CPU side, once the
>>>> amdgpu_ring_commit() is called the SDMA engine is already working.
>>>
>>> It is on the CPU side but can create bubbles in the pipeline, no? Is
>>> there no scope with AMD to have GFX and SDMA jobs depend on each other?
>>> Because, as said, I've seen some high latencies from the GFXOFF disable
>>> calls.
>>
>> The SDMA job is already executing at that point.  The allow gfxoff
>> message to the firmware shouldn't come until later because it's
>> handled by a delayed work thread from end_use().  If you have multiple
>> submissions to SDMA within the delay window, the begin_use() and
>> end_use() will just be ref count handling and won't actually talk to
>> the firmware.
> 
> I followed up with testing a bunch more games, and is it turns out, Cyberpunk 
> 2077 is the only one which has this submission patterns where default 
> GFX_OFF_DELAY_ENABLE is regularly defeated.
> 
> There, around 1.2 times per second the SDMA submissions miss that 100ms 
> hysteresis and cause a CPU latency over 100us (I only measured when >100us 
> and ignored the rest). Average latency is ~400us and max is ~2ms. So IMHO 
> quite bad.

What exactly does Cyberpunk do to hit that? Are those SDMA page table updates, 
clears or userspace submissions?

> 
> And the vast majority of those latencies come from the SMU request. Only very 
> rarely someone hits the mutex contention path.
> 
> So that was the motivation for the RFC. I suppose I could have also proposed 
> to increase the hysteresis, but holding the GFXOFF disabled for the duration 
> of the job sounded preferable for power consmuption.
> 
> Anyway, given I only found Cyberpunk 2077 suffers from this I guess it maybe 
> isn't to interesting to upstream for you guys. Then again it is limited to 
> specific old SKU so maybe it should not be that controversial either? Only 
> that Christian NAKed tying it to job lifetime. So I don't know, AMDs call.

Well what you could do is to take a look if we couldn't simplify the SMU and/or 
adjust the GFX_OFF_DELAY_ENABLED.

On the other hand why does it help to keep GFXOFF disabled while running the 
SDMA job?

Regards,
Christian.

> 
> Regards,
> 
> Tvrtko
> 

Reply via email to