https://bugs.freedesktop.org/show_bug.cgi?id=108493

--- Comment #9 from Timur Kristóf <ven...@msn.com> ---
I think I discovered a possible reason for this issue. If you look at the
DDEBUG dumps, it says in several places: "This slot was corrupted in GPU
memory". So I began to suspect something was wrong with the VRAM.

After looking around a bit, I found that the amdgpu driver does not honor the
voltage settings from the VBIOS, and sets the memory to use lower voltages
instead. So basically the driver undervolts the VRAM without me asking to do
so. I guess this might be considered a feature for some people.

However, when I manually edit pp_od_clk_voltage to increase the OD_MCLK
voltages, then the card begins to work in a stable manner and the GPU hang is
gone. (Or at the very least I haven't seen a hang yet, whereas previously it
used to hang in less than a minute.)

In my case, the VBIOS wants to set the MCLK voltages to 1000 mV at all
frequencies, while amdgpu sets them to 750 mv, 800 mV, and 900mV. And it turns
out that 900 mV is just too low for my card at 1750 MHz.

[root@timur-xps ~]# cat /sys/class/drm/card0/device/pp_od_clk_voltage 
OD_SCLK:
0:        300MHz        750mV
1:        588MHz        765mV
2:        952MHz        900mV
3:       1041MHz        975mV
4:       1106MHz       1031mV
5:       1168MHz       1093mV
6:       1209MHz       1143mV
7:       1244MHz       1150mV
OD_MCLK:
0:        300MHz        750mV
1:       1000MHz        800mV
2:       1750MHz        900mV
OD_RANGE:
SCLK:     300MHz       2000MHz
MCLK:     300MHz       2250MHz
VDDC:     750mV        1150mV
[root@timur-xps ~]# cat /sys/kernel/debug/dri/0/amdgpu_vbios > mybios.rom
[root@timur-xps ~]# pbec -i mybios.rom -s -r MEMORY_CLOCK

----
[DEFAULT] ATOM_MCLK_ENTRY Array
----

Entry: 0
        Frequency: 300 MHz.
        Voltage:. 1000 MV
Entry: 1
        Frequency: 1000 MHz.
        Voltage:. 1000 MV
Entry: 2
        Frequency: 1750 MHz.
        Voltage:. 1000 MV
----


Here is some info about the VBIOS:

[root@timur-xps ~]# cat /sys/class/drm/card0/device/subsystem_device
0xe343
[root@timur-xps ~]# cat /sys/class/drm/card0/device/subsystem_vendor
0x1da2
[root@timur-xps ~]# cat /sys/class/drm/card0/device/vbios_version
113-D00034-S07

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to