On Meteor Lake with a hybrid Intel/NVIDIA GPU setup, s2idle resume can
leave the CX0 PHY MSGBUS unresponsive. When this happens, the PLL
enable sequence silently fails: register writes via MSGBUS are dropped,
the PLL never locks, but the driver marks it as enabled and proceeds to
drive the pipe.
The root cause of the MSGBUS becoming unresponsive appears to be the
NVIDIA dGPU not participating in S0ix (addressed via the
NVreg_EnableS0ixPowerManagement module parameter). However, the i915
driver should handle PLL enable failures gracefully regardless of the
trigger.
This series:
1. Fixes intel_cx0_pll_is_enabled() to check the hardware ACK bit,
not just the driver-set REQUEST bit, so a PLL that failed to lock
is correctly reported as disabled.
2. Adds error propagation through the DPLL enable path: changes the
.enable callback to return int, threads errors through
_intel_enable_shared_dpll() and intel_dpll_enable(), and checks
the result in hsw_crtc_enable() and ilk_pch_enable().
3. Makes the CX0 PLL enable path return -ETIMEDOUT when the PHY
fails to come out of reset or the PLL fails to lock.
Found on a Lenovo ThinkPad with Intel Ultra 7 155H and NVIDIA RTX 2000
Ada. Kernel traces before each crash:
i915: Failed to bring PHY A to idle.
i915: PHY A Read 0c70 failed after 3 retries.
i915: Timeout waiting for DDI BUF A to get active
i915: [CRTC:149:pipe A] flip_done timed out
Aaron Esau (3):
drm/i915/cx0: check PLL ACK bit in intel_cx0_pll_is_enabled()
drm/i915/dpll: add error propagation to DPLL enable path
drm/i915/cx0: return errors from CX0 PLL enable on failure
drivers/gpu/drm/i915/display/intel_cx0_phy.c | 54 ++++++++----
drivers/gpu/drm/i915/display/intel_cx0_phy.h | 6 +-
drivers/gpu/drm/i915/display/intel_display.c | 10 ++-
drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 87 ++++++++++++++-----
drivers/gpu/drm/i915/display/intel_dpll_mgr.h | 2 +-
.../gpu/drm/i915/display/intel_pch_display.c | 7 +-
6 files changed, 117 insertions(+), 49 deletions(-)
--
2.54.0