[Public]

Inline.

From: Chris Friesen <chris.frie...@windriver.com>
Sent: Wednesday, September 17, 2025 11:45 AM
To: Deucher, Alexander <alexander.deuc...@amd.com>; Koenig, Christian 
<christian.koe...@amd.com>; xinhui....@amd.com; amd-gfx@lists.freedesktop.org
Subject: questions around driver for AMD Instinct GPUs (for AI/ML)


Hi,

I'm wondering if you can give me some guidance.

  1.  I'm just starting to look at the use of AMD GPUs for AI/ML workloads, and 
I'm wondering if there is any documentation available on the tradeoffs between 
the in-tree and out-of-tree amdgpu drivers?   For commercial use should we 
expect to use the out-of-tree driver?
At the end of the day, it’s the same driver.  There are basically two reasons 
we maintain a DKMS package:
1 support for out of tree features (e.g. support for APIs like PeerDirect)
2 Support for newer devices on older enterprise distro kernels.
Most customers use the DKMS packaged today because it’s the most convenient way 
to get the ROCm stack and it contains compatibility for the widest range of 
kernels and APIs.  However, we are working with distros to provide native ROCm 
packages via the standard package managers.  You can already get most ROCm 
packages in Fedora and Debian for example.  Additionally, with a new enough 
kernel, you can use dma-buf rather than peerdirect for P2P DMA on most devices.

  1.  Are there any known issues with using the amdgpu driver on a kernel with 
PREEMPT_RT enabled?
YMMV. It’s not something we test heavily and there are standard PREEMPT_RT 
pitfalls for slow operations like i2c bit banging that might get preempted and 
I’m not sure how well dma_fences would work in that case..

  1.  What's AMD's policy on backporting in-tree driver improvements to LTS 
kernels (6.12 for example)?   It looks like only around 10% of the amdgpu 
changes going in to mainline are being ported back.
The rule for LTS kernels is that they only accept bug fixes, not new features.  
We try to be pretty aggressive about getting bug fixes backported.  
Compatibility with older kernels is one of the main reasons we have DKMS 
packages.

Separately (and I'm not sure if this is the right place to ask), do you know 
why the ROCm compatibility matrix[1] indicates that the  MI355X/MI350X/MI325X 
are not supported on Debian?

I’m not sure off hand.

Alex

Thanks,

Chris

[1] https://rocm.docs.amd.com/en/latest/compatibility/compatibility-matrix.html

Reply via email to