Am 30.06.20 um 00:46 schrieb Luben Tuikov:
On 2020-06-26 1:04 p.m., Christian König wrote:
Am 26.06.20 um 18:12 schrieb Alex Jivin:
Adding a delay between writing to UVD control register and reading from it.
This is to allow the HW to process the write command.

Signed-off-by: Alex Jivin <>
Suggested-By: Luben Tukov <>
   drivers/gpu/drm/amd/amdgpu/si.c | 6 ++++++
   1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 9d7b4ccd17b8..42cdc14fb79d 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -1435,6 +1435,12 @@ static int si_uvd_send_upll_ctlreq(struct amdgpu_device 
        /* Assert UPLL_CTLREQ */
        WREG32_P(cg_upll_func_cntl, UPLL_CTLREQ_MASK, ~UPLL_CTLREQ_MASK);
+ /* Waiting for HW to process the previous write.
+        * This is to give a chance to HW to act before
+        * the first read is done.
+        */
+       mdelay(1);
Mhm, that is most likely not a good idea.

We need to issue a read after the write to make sure that the stuff is
send out to the hardware.
Tracing down WREG32_P(), it seems to be writing and then reading the register,

Why do you think so?

What the macro should do is to read, apply mask and value and then write it back to the register:

#define WREG32_P(reg, val, mask)                \
    do {                            \
        uint32_t tmp_ = RREG32(reg);            \
        tmp_ &= (mask);                    \
        tmp_ |= ((val) & ~(mask));            \
        WREG32(reg, tmp_);                \
    } while (0)

back to back, twice over, when deferring to PCIe space, and just writel() when 
in mmio
space. (There is separate thread on this as it doesn't seem correct.)

That indeed sounds fishy, but all those registers should be in mmio space.

Adding a delay here is probably just postponing that. Do we have some
note in the documentation that this is necessary?
Flushing the write buffer (by issuing a read if necessary) is different
than waiting for the hardware to process the request.

It's not flushing the write buffer. The hardware is often build in a way that the next read is triggering the action instead of the write.

That's done because writes should be accepted by the hardware block immediately while reads have a timeout associated with it.

The current code does flush the write buffer by reading back,
and then delaying (both in the loop), which does achieve this,
but it leaves a corner case as I wrote in my review.
The corner case is that if the status
register changes to "done" while in the last "delay()"
we then check the loop invariant and exit, as opposed to reading
the register and determining that it is done successfully.

Current, all in pseudo-code:


for (count = 0; count < MAX_COUNT; count++) {
        res = read()
        if (res is success) break
        mdelay(10)              <-- if it changes here, we miss it on the last 


write()   ; assume write buffer flush
do {
        res = read()
} while (res != success && ++count < MAX_COUNT)

This solves the corner case, and ensures a time delay of 10 for
the hardware to process its job, but a delay of 1 for polling
the status, as it could be done "anytime now."

That seems to assume that the ACK bits are not immediately cleared on the write. How do you got to that conclusion?

Processing the request can take even longer than 10ms depending on what is done, which reference clock is selected, the temperature of the hardware, sleep mode etc etc...

For example switching to bypass mode usually comes immediately, e.g. you write the CNTL register and you immediately get an ack back.

But as far as I know switching from bypass back to normal means that we need to wait for both the reference, feedback and output signal to be logical low and then high again at the same time.

Depending on the frequencies we got here that can take a bit of time.

Assuming that WREG32_P() flushes the write buffer, as it seems
that it does, the idea here is to give the hardware some time to process
the request (from writing a value to it), but when polling to poll
a shorter amount of time.

Not a bad idea, but this is based on the diagnostic code and already used for nearly a decade.

In other words this code is what the hardware engineers recommended, is used *much* more often outside the driver and known to be working well.

I would've also liked to see the mutex business fixed
as well from my original patch review, as it is easy to prove
that it is always taken, so no need to embed it inside the if().

That is indeed a software problem which should be fixed.




        /* Wait for CTLACK and CTLACK2 to get asserted */
        for (i = 0; i < SI_MAX_CTLACKS_ASSERTION_WAIT; ++i) {
                uint32_t mask = UPLL_CTLACK_MASK | UPLL_CTLACK2_MASK;

amd-gfx mailing list

Reply via email to