I retested hot plug tests at the commit I mentioned bellow - looks ok,
my ASIC is Navi 10, I also tested using Vega 10 and older Polaris ASICs
(whatever i had at home at the time). It's possible there are extra
issues in ASICs like ur which I didn't cover during tests.
andrey@andrey-test:~/drm$ sudo ./build/tests/amdgpu/amdgpu_test -s 13
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
The ASIC NOT support UVD, suite disabled
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
The ASIC NOT support VCE, suite disabled
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
The ASIC NOT support UVD ENC, suite disabled.
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
Don't support TMZ (trust memory zone), security suite disabled
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
/usr/local/share/libdrm/amdgpu.ids: No such file or directory
Peer device is not opened or has ASIC not supported by the suite, skip
all Peer to Peer tests.
CUnit - A unit testing framework for C - Version 2.1-3
http://cunit.sourceforge.net/
*Suite: Hotunplug Tests**
** Test: Unplug card and rescan the bus to plug it back
.../usr/local/share/libdrm/amdgpu.ids: No such file or directory**
**passed**
** Test: Same as first test but with command submission
.../usr/local/share/libdrm/amdgpu.ids: No such file or directory**
**passed**
** Test: Unplug with exported bo .../usr/local/share/libdrm/amdgpu.ids:
No such file or directory**
**passed*
Run Summary: Type Total Ran Passed Failed Inactive
suites 14 1 n/a 0 0
tests 71 3 3 0 1
asserts 21 21 21 0 n/a
Elapsed time = 9.195 seconds
Andrey
On 2022-04-20 11:44, Andrey Grodzovsky wrote:
The only one in Radeon 7 I see is the same sysfs crash we already
fixed so you can use the same fix. The MI 200 issue i haven't seen yet
but I also haven't tested MI200 so never saw it before. Need to test
when i get the time.
So try that fix with Radeon 7 again to see if you pass the tests (the
warnings should all be minor issues).
Andrey
On 2022-04-20 05:24, Shuotao Xu wrote:
That a problem, latest working baseline I tested and confirmed
passing hotplug tests is this branch and commit
https://gitlab.freedesktop.org/agd5f/linux/-/commit/86e12a53b73135806e101142e72f3f1c0e6fa8e6
which is amd-staging-drm-next. 5.14 was the branch we ups-reamed the
hotplug code but it had a lot of regressions over time due to new
changes (that why I added the hotplug test to try and catch them
early). It would be best to run this branch on mi-100 so we have a
clean baseline and only after confirming this particular branch
from this commits passes libdrm tests only then start adding the KFD
specific addons. Another option if you can't work with MI-100 and
this branch is to try a different ASIC that does work with this
branch (if possible).
Andrey
OK I tried both this commit and the HEAD of and-staging-drm-next on
two GPUs( MI100 and Radeon VII) both did not pass hotplugout libdrm
test. I might be able to gain access to MI200, but I suspect it would
work.
I copied the complete dmesgs as follows. I highlighted the OOPSES for
you.
Radeon VII: