On 2024-04-04 09:05, Christian Kastner wrote: > The issue is already visible with AMD_LOG_LEVEL=1, it's the lack of PCIe > atomics: > >> [ RUN ] rocfft_UnitTest.default_load_callback_complex_single >> :1:rocvirtual.cpp :2949: 1796815625 us: [pid:1917 >> tid:0x7f4a2102c980] Pcie atomics not enabled, hostcall not supported >> :1:rocvirtual.cpp :3289: 1796816120 us: [pid:1917 >> tid:0x7f4a2102c980] AQL dispatch failed> clients/tests
> In an older ROCm ticket, a workaround to enable PCIe atomics in the > guest was discussed [1], but I never got this to work. The relevant bit > is not set after invoking setpci. In a more recent issue [2], a lack of PCIe atomics was also discovered on physical hardware (it can depend on the CPU and/or the PCIe slot). In that issue, it was stated that updating to ROCm 6.0 (and PyTorch) resolved the issue. I just rebuilt rocfft to 6.0.2 but the issue is still present. But that was naive, there are other < 6.0 components in the stack that could affect this. > [1] > https://github.com/ROCm/ROCK-Kernel-Driver/issues/26#issuecomment-313857180 [2] https://github.com/ROCm/ROCm/issues/2429