Source: amdsmi
Version: 7.2.0-3
Severity: normal
Tags: patch upstream
Forwarded: https://github.com/ROCm/rocm-systems/pull/7250

Hello,

I tried to re-introduce a dependency of hwloc on amdsmi in hwloc
2.14.0-1, but then it fell (again) on rocm_init() being too verbose: if
there is no AMD hardware in the system, rocm_init() complains loudly,
breaking various autopkgtest and such, e.g. openmpi:

https://ci.debian.net/packages/o/openmpi/unstable/amd64/71772823/

I thus had to disable the amdsmi dependency (again), preventing people
from getting the locality information of amd devices when they have one.

I have submitted a simple fix to upstream, the copilot bot suggested a
more invasive fix, not sure when that will converge.

I have attached the simple fix that can be applied in debian already.

Samuel

-- System Information:
Debian Release: forky/sid
  APT prefers testing
  APT policy: (990, 'testing'), (500, 'unstable-debug'), (500, 
'testing-debug'), (500, 'stable-security'), (500, 'stable-debug'), (500, 
'proposed-updates'), (500, 'oldstable-debug'), (500, 'oldoldstable'), (500, 
'buildd-unstable'), (500, 'unstable'), (500, 'stable'), (500, 'oldstable'), (1, 
'experimental-debug'), (1, 'buildd-experimental'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386, arm64

Kernel: Linux 7.0.10+deb14-amd64 (SMP w/22 CPU threads; PREEMPT)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=fr_FR.UTF-8, LC_CTYPE=fr_FR.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

-- 
Samuel
Now I know someone out there is going to claim, "Well then, UNIX is intuitive,
because you only need to learn 5000 commands, and then everything else follows
from that! Har har har!"
(Andy Bates in comp.os.linux.misc, on "intuitive interfaces", slightly
defending Macs.)
--- a/rocm_smi/src/rocm_smi.cc
+++ b/rocm_smi/src/rocm_smi.cc
@@ -498,8 +498,10 @@ rsmi_init(uint64_t flags) {
     } catch(const amd::smi::rsmi_exception& e) {
       smi.Cleanup();
       if (e.error_code() == RSMI_INITIALIZATION_ERROR &&
-          !strcmp(e.what(),
-               "Failed to initialize rocm_smi library (KFD node discovery).")) 
{
+          (!strcmp(e.what(),
+               "Failed to initialize rocm_smi library (KFD node discovery).")
+          || !strcmp(e.what(),
+               "DiscoverAmdgpuDevices() failed."))) {
         // This system does not actually have ROCM drivers set up
         // We were probably just called through dependency, just report the
         // error and log without complaining loudly.

Reply via email to