Initially, rocm.eclass append xnack[1,2] feature flag to gfx9 GPUs,
since ROCm upstream does this in many of their math libraries, e.g.
rocBLAS [3]. The list includes gfx90a:xnack+, indicating xnack is usable
for MI200 series, thus rocm.eclass append :xnack+ to gfx90a.

But it turns out xnack- is also common for MI200 series, restricting to
xnack+ produces incompatible GPU kernel with xnack- mode.

Also, community also explores using xnack on other gfx9 GPU [4,5], which
is previously restricted to xnack- in rocm.eclass.

By not appending xnack feature flag, GPU kernels are compiled to "xnack
any" mode, which can be run in either mode, potentially scarifying some
performance [6,7], with no direct evidence. rocFFT reports no
performance penalty[8].

For the reason above, do not append xnack feature flag to AMDGPU_TARGETS,
which is compatible with GPUs operate in both xnack mode.

[1] https://wiki.gentoo.org/wiki/ROCm#XNACK_target_feature
[2] https://rocm.docs.amd.com/en/latest/conceptual/gpu-memory.html#xnack
[3] 
https://github.com/ROCm/rocBLAS/blob/release/rocm-rel-5.0/CMakeLists.txt#L201
[4] https://niconiconi.neocities.org/tech-notes/xnack-on-amd-gpus/
[5] https://arxiv.org/abs/2401.02680
[6] https://llvm.org/docs/AMDGPUUsage.html#target-features
[7] 
https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes
[8] 
https://github.com/ROCm/rocFFT/commit/cd2689360ba3b3579d044d8925838ff307b4b4cf

Signed-off-by: Yiyang Wu <xgreenlandfor...@gmail.com>
---
 eclass/rocm.eclass | 19 ++-----------------
 1 file changed, 2 insertions(+), 17 deletions(-)

diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index 9804ecde97d0..e03e8bdd507a 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -1,4 +1,4 @@
-# Copyright 2022-2023 Gentoo Authors
+# Copyright 2022-2024 Gentoo Authors
 # Distributed under the terms of the GNU General Public License v2
 
 # @ECLASS: rocm.eclass
@@ -201,22 +201,7 @@ unset -f _rocm_set_globals
 # Append default target feature to GPU arch. See
 # https://llvm.org/docs/AMDGPUUsage.html#target-features
 get_amdgpu_flags() {
-       local amdgpu_target_flags
-       for gpu_target in ${AMDGPU_TARGETS}; do
-       local target_feature=
-               case ${gpu_target} in
-                       gfx906|gfx908)
-                               target_feature=:xnack-
-                               ;;
-                       gfx90a)
-                               target_feature=:xnack+
-                               ;;
-                       *)
-                               ;;
-               esac
-               amdgpu_target_flags+="${gpu_target}${target_feature};"
-       done
-       echo "${amdgpu_target_flags}"
+       echo $(printf "%s;" ${AMDGPU_TARGETS[@]})
 }
 
 # @FUNCTION: check_amdgpu
-- 
2.41.0


Reply via email to