[AMD Official Use Only - AMD Internal Distribution Only]

Hi Mike,

Could you more details about your setup, and how you were able to repro it ?

--

Regards,
Jay
________________________________
From: Mikhail Gavrilov <mikhail.v.gavri...@gmail.com>
Sent: Tuesday, May 20, 2025 5:33 AM
To: Pillai, Aurabindo <aurabindo.pil...@amd.com>; Chung, ChiaHsuan (Tom) 
<chiahsuan.ch...@amd.com>; Wu, Ray <ray...@amd.com>; Wheeler, Daniel 
<daniel.whee...@amd.com>; Deucher, Alexander <alexander.deuc...@amd.com>; 
amd-gfx list <amd-...@lists.freedesktop.org>; dri-devel 
<dri-devel@lists.freedesktop.org>; Linux List Kernel Mailing 
<linux-ker...@vger.kernel.org>; Linux regressions mailing list 
<regressi...@lists.linux.dev>
Subject: 6.15-rc6/regression/bisected - after commit f1c6be3999d2 error 
appeared: *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error

Hi,
After commit f1c6be3999d2 error appears:
[ 1421.701677] amdgpu 0000:03:00.0: [drm] *ERROR*
dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic
data
[ 1421.896810] amdgpu 0000:03:00.0: [drm] *ERROR*
dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic
data
[ 1422.088397] amdgpu 0000:03:00.0: [drm] *ERROR*
dc_dmub_srv_log_diagnostic_data: DMCUB error - collecting diagnostic
data
[ 1426.448674] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
your previous command: SMN_C2PMSG_66:0x00000012
SMN_C2PMSG_82:0x00000005
[ 1426.448804] amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
[ 1430.149443] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
your previous command: SMN_C2PMSG_66:0x00000012
SMN_C2PMSG_82:0x00000005
[ 1430.149456] amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
[ 1433.846389] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
your previous command: SMN_C2PMSG_66:0x00000012
SMN_C2PMSG_82:0x00000005
[ 1433.846400] amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
[ 1437.543718] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with
your previous command: SMN_C2PMSG_66:0x00000012
SMN_C2PMSG_82:0x00000005
[ 1437.543727] amdgpu 0000:03:00.0: amdgpu: Failed to export SMU metrics table!
[ 1439.966738] watchdog: CPU28: Watchdog detected hard LOCKUP on cpu 28
[ 1439.966742] Modules linked in: uinput rfcomm snd_seq_dummy
snd_hrtimer nft_queue nfnetlink_queue nf_conntrack_netbios_ns
nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables qrtr bnep sunrpc binfmt_misc amd_atl intel_rapl_msr
intel_rapl_common mt7921e mt7921_common mt792x_lib mt76_connac_lib
edac_mce_amd mt76 btusb btrtl btintel snd_hda_codec_realtek btbcm
btmtk snd_hda_codec_generic snd_hda_scodec_component kvm_amd
snd_hda_codec_hdmi mac80211 bluetooth vfat snd_hda_intel fat
snd_intel_dspcfg kvm snd_intel_sdw_acpi snd_hda_codec snd_hda_core
spd5118 snd_hwdep libarc4 snd_seq irqbypass snd_seq_device wmi_bmof
cfg80211 r8169 rapl joydev snd_pcm snd_timer i2c_piix4 pcspkr k10temp
i2c_smbus snd realtek rfkill soundcore gpio_amdpt gpio_generic loop
nfnetlink zram lz4hc_compress lz4_compress amdgpu amdxcp i2c_algo_bit
drm_ttm_helper ttm drm_exec polyval_clmulni
[ 1439.966788]  gpu_sched nvme polyval_generic ghash_clmulni_intel
drm_suballoc_helper drm_panel_backlight_quirks ucsi_ccg sha512_ssse3
nvme_core drm_buddy typec_ucsi sha256_ssse3 drm_display_helper
nvme_keyring typec sha1_ssse3 nvme_auth sp5100_tco cec video wmi fuse
[ 1439.966799] irq event stamp: 235192
[ 1439.966800] hardirqs last  enabled at (235191):
[<ffffffffa60012a6>] asm_exc_page_fault+0x26/0x30
[ 1439.966805] hardirqs last disabled at (235192):
[<ffffffffa9ba5277>] irqentry_enter+0x57/0x60
[ 1439.966808] softirqs last  enabled at (234272):
[<ffffffffa660ee39>] handle_softirqs+0x579/0x840
[ 1439.966810] softirqs last disabled at (234263):
[<ffffffffa660f236>] __irq_exit_rcu+0x126/0x240
[ 1439.966813] CPU: 28 UID: 1000 PID: 209499 Comm: cc1 Tainted: G
  W    L      6.15.0-rc5-01-3ce9925823c7d6bb0e6eb951bf2db0e9e182582d+
#1 PREEMPT(lazy)
[ 1439.966817] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[ 1439.966818] Hardware name: ASRock B650I Lightning WiFi/B650I
Lightning WiFi, BIOS 3.08 09/18/2024
[ 1439.966819] RIP: 0010:delay_halt_mwaitx+0x20/0x50

And then the system hangs after SOFTLOCKUP.

Bisect says that this is commit f1c6be3999d2

Author: Aurabindo Pillai <aurabindo.pil...@amd.com>
Date:   Wed Apr 16 11:26:54 2025 -0400

    drm/amd/display: more liberal vmin/vmax update for freesync

    [Why]
    FAMS2 expects vmin/vmax to be updated in the case when freesync is
    off, but supported. But we only update it when freesync is enabled.

    [How]
    Change the vsync handler such that dc_stream_adjust_vmin_vmax() its called
    irrespective of whether freesync is enabled. If freesync is supported,
    then there is no harm in updating vmin/vmax registers.

    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3546
    Reviewed-by: ChiaHsuan Chung <chiahsuan.ch...@amd.com>
    Signed-off-by: Aurabindo Pillai <aurabindo.pil...@amd.com>
    Signed-off-by: Ray Wu <ray...@amd.com>
    Tested-by: Daniel Wheeler <daniel.whee...@amd.com>
    Signed-off-by: Alex Deucher <alexander.deuc...@amd.com>
    (cherry picked from commit cfb2d41831ee5647a4ae0ea7c24971a92d5dfa0d)
    Cc: sta...@vger.kernel.org

 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)


Of course I checked revert of commit f1c6be3999d2
And I can confirm that without f1c6be3999d2 this issue is gone.

My machine spec: https://linux-hardware.org/?probe=4635c5fcb1
And I attached below my build config, bisect log and full kernel log.

Aurabindo, can you look, please, ASAP?

--
Best Regards,
Mike Gavrilov.

Reply via email to