Why?

On some AMD gpu's in some configurations, the start of the VRAM domain, as
reported by amdgpu_ttm_domain_start(adev, AMDGPU_GEM_DOMAIN_VRAM), is
placed at address 0 during GMC init. This is a problem if, during a cursor
plane update, the cursor image bo, which gets always pinned into VRAM,
is placed at offset zero of the VRAM domain, and thereby at the
absolute address afb->address 0.

The display hw apparently doesn't like such a zero start address for at
least native cursor mode, as various checks inside DC are in place, e.g.,
high level dc_stream_check_cursor_attributes(), and lower level DCN
version specific cursor hw programming checks, which do reject cursor
attribute updates with attributes->address.quad_part == 0.

User visible symptoms of this are seriously broken mouse cursors under
both X11 and Wayland (tested with KDE/KWin, GNOME/Mutter, GDM login
manager): Mouse cursor flickers, is invisible, randomly becomes invisible,
or fails to adapt the cursor shape to the context, e.g., when moving from
a text input field to other windows, or window decorations etc. This makes
the cursor irritating and impossible to use.

The drm.debug=4 log shows DRM KMS debug messages of the form
"DC: Cursor address is 0!", and the general syslog prints errors like
"[drm:amdgpu_dm_plane_handle_cursor_update [amdgpu]] *ERROR* DC failed to
set cursor attributes"

I observe this bug on my dual-gpu Apple 2017 MacBookPro since Linux 4.11,
where the kernels early EFI setup force-enables both the Intel iGPU and
AMD dGPU. This leads to the AMD VRAM start being placed at 0x0 and then
causes massive cursor problems. On earlier kernels, only the AMD dGPU was
exposed, the Intel iGPU was disabled / hidden from Linux by EFI firmware.
This caused the AMD gpu to place VRAM start at the non-zero
address 0x000000F400000000, and the mouse cursor worked fine. I confirmed
with umr that the mmMC_VM_FB_LOCATION register of my Polaris 11 gpu indeed
read back 0x0000 in the lower 16 bits in the dual-gpu case, causing
gmc_v8_0_vram_gtt_location() to setup start of VRAM domain at zero.
I don't know what causes the change, but most likely the UEFI firmware
somehow triggers this change before main kernel boot - calling into the
VBIOS, I guess.

There is at least one 8 months old bug report in AMD's issue tracker,
reporting the same symptoms on other AMD setups, cfe.:
https://gitlab.freedesktop.org/drm/amd/-/issues/4302

So unless there is another more clean and reliable way to prevent the
cursor bo from being placed at address zero, or unless the display hw
is actually fine with address zero and those checks in DC are overly
cautious, this needs to be fixed.

Note that simply removing the "zero address -> reject cursor update"
checks worked on my Polaris11 with DCE 11.2 display engine, fixing the
cursor without causing any other obvious trouble. So maybe this is only
a limitation of recent DCN engine versions, or a pointless check.

How?

Add a new AMD bo placement flag which requests bo pinning / placement at
non-zero VRAM address only during amdgpu_bo_pin(). Use this flag for bo's
on the cursor plane during amdgpu_dm_plane_helper_prepare_fb().

I don't know if this is the best approach. It feels hacky, but it is the
only approach I was able to do and it seems to work fine enough.

If this is a good enough fix, it should be backported, but backporting
to earlier than Linux 6.12 might be cumbersome due to changes to the
amdgpu_bo_pin() implementation.

Signed-off-by: Mario Kleiner <[email protected]>
Tested-by: Mario Kleiner <[email protected]>
Cc: <[email protected]> # v6.12+
Cc: Harry Wentland <[email protected]>
Cc: Leo Li <[email protected]>
Cc: Alex Deucher <[email protected]>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c            | 11 +++++++++++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  6 ++++--
 include/uapi/drm/amdgpu_drm.h                         |  7 +++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 1fb956400696..97131fc8fbdf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -975,6 +975,17 @@ int amdgpu_bo_pin(struct amdgpu_bo *bo, u32 domain)
                if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS &&
                    bo->placements[i].mem_type == TTM_PL_VRAM)
                        bo->placements[i].flags |= TTM_PL_FLAG_CONTIGUOUS;
+
+               /* Ensure bo is never pinned at amdgpu_bo_gpu_offset() == 0
+                * for VRAM allocations, as some of the DC code does not
+                * like that, e.g., mouse cursor display image bo's.
+                */
+               if (bo->flags & AMDGPU_GEM_CREATE_VRAM_NON_ZERO_ADDRESS &&
+                   bo->placements[i].mem_type == TTM_PL_VRAM &&
+                   !bo->placements[i].fpfn &&
+                   !amdgpu_ttm_domain_start(adev, TTM_PL_VRAM)) {
+                       bo->placements[i].fpfn = 1;
+               }
        }
 
        r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 394880ec1078..cd7f53d3036c 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -959,10 +959,12 @@ static int amdgpu_dm_plane_helper_prepare_fb(struct 
drm_plane *plane,
                goto error_unlock;
        }
 
-       if (plane->type != DRM_PLANE_TYPE_CURSOR)
+       if (plane->type != DRM_PLANE_TYPE_CURSOR) {
                domain = amdgpu_display_supported_domains(adev, rbo->flags);
-       else
+       } else {
                domain = AMDGPU_GEM_DOMAIN_VRAM;
+               rbo->flags |= AMDGPU_GEM_CREATE_VRAM_NON_ZERO_ADDRESS;
+       }
 
        rbo->flags |= AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS;
        r = amdgpu_bo_pin(rbo, domain);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 1d34daa0ebcd..6dee7653c54e 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -181,6 +181,13 @@ extern "C" {
 #define AMDGPU_GEM_CREATE_EXT_COHERENT         (1 << 15)
 /* Set PTE.D and recompress during GTT->VRAM moves according to TILING flags. 
*/
 #define AMDGPU_GEM_CREATE_GFX12_DCC            (1 << 16)
+/* Flag that BO must not be placed in VRAM domain at offset zero if the
+ * VRAM domain itself starts at address zero.
+ *
+ * Used internally to prevent placement of cursor image BO at that location,
+ * as the display hardware doesn't like that for hardware cursors.
+ */
+#define AMDGPU_GEM_CREATE_VRAM_NON_ZERO_ADDRESS (1 << 17)
 
 struct drm_amdgpu_gem_create_in  {
        /** the requested memory size */
-- 
2.43.0

Reply via email to