[Public] Inline.
From: Chris Friesen <chris.frie...@windriver.com> Sent: Wednesday, September 17, 2025 11:45 AM To: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; xinhui....@amd.com; amd-gfx@lists.freedesktop.org Subject: questions around driver for AMD Instinct GPUs (for AI/ML) Hi, I'm wondering if you can give me some guidance. 1. I'm just starting to look at the use of AMD GPUs for AI/ML workloads, and I'm wondering if there is any documentation available on the tradeoffs between the in-tree and out-of-tree amdgpu drivers? For commercial use should we expect to use the out-of-tree driver? At the end of the day, it’s the same driver. There are basically two reasons we maintain a DKMS package: 1 support for out of tree features (e.g. support for APIs like PeerDirect) 2 Support for newer devices on older enterprise distro kernels. Most customers use the DKMS packaged today because it’s the most convenient way to get the ROCm stack and it contains compatibility for the widest range of kernels and APIs. However, we are working with distros to provide native ROCm packages via the standard package managers. You can already get most ROCm packages in Fedora and Debian for example. Additionally, with a new enough kernel, you can use dma-buf rather than peerdirect for P2P DMA on most devices. 1. Are there any known issues with using the amdgpu driver on a kernel with PREEMPT_RT enabled? YMMV. It’s not something we test heavily and there are standard PREEMPT_RT pitfalls for slow operations like i2c bit banging that might get preempted and I’m not sure how well dma_fences would work in that case.. 1. What's AMD's policy on backporting in-tree driver improvements to LTS kernels (6.12 for example)? It looks like only around 10% of the amdgpu changes going in to mainline are being ported back. The rule for LTS kernels is that they only accept bug fixes, not new features. We try to be pretty aggressive about getting bug fixes backported. Compatibility with older kernels is one of the main reasons we have DKMS packages. Separately (and I'm not sure if this is the right place to ask), do you know why the ROCm compatibility matrix[1] indicates that the MI355X/MI350X/MI325X are not supported on Debian? I’m not sure off hand. Alex Thanks, Chris [1] https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html