This patch adds the currently unused static function 'is_integrated_apu'
to libgomp/plugin/plugin-{gcn,nvptx}.c.

While currently not in use ('#if 0'), I'd like to add it already now
as prep work. The idea is to use it to enable self mapping automatically
be default if mapping it pointless (copying data despite sharing the
same memory controller). [See below for more.]

Any comments?

* * *

Regarding the property check:

For Nvidia GPUs, I am not sure whether it is useful as there does not
seem to be any integrated GPU so far - not even Grace Hopper.

For AMD GPUs, I have no idea whether it works for older GPUs, but for
MI300A it works - but XNACK needs to be enabled. It should also works
if the APU does not support XNACK.


Side remark: I think we need eventually to switch to per-device
capabilities on top of the per-device-type (Nvidia, GCN) capability
to support multi-GPU systems (e.g. AMD GPU with APU plus separate
discrete AMD GPU) to support this more fine grained. This applies
likewise to auto self mapping and to 'omp requires unified_shared_memory'.

On the other hand, having one type of Nvidia or AMD GPUs is common
and disabling GPUs is also one way out (e.g. ROCR_VISIBLE_DEVICES).

For AMD, this goes in lockstep with compiled-for vs. not-compiled-for
GPUs as only one type is supported at a time. We should also eventually
handle this better (host fallback if for a GPU ISA no code is available,
Supporting multiple ISA in a binary), but that's not an urgent feature.

* * *

Regarding auto-USM support:

The new functions currently aren't not used as for global variables
in static memory ('declare target'), 'map' still needs to data to/from
those. - One solution is to only have 'declare target link' variables
as GCC initializes them (with USM) such that they link to the host
variable.

Thus, prerequisite for this feature is the missing mapping support
in libgomp (OpenMP and OpenACC? Or only OpenMP as a starter?) and
for 'omp requires self_maps', the conversion to 'link' should also
happen automatically.

Once the mapping support is in, auto-USM can be enabled and I guess
we also want to have an environment variable to toggle between:
- always map (to override auto USM),
- use self-maps (for systems supporting USM but aren't APUs), and
- force-use self-maps (for sytems that report not supporting USM,
  e.g. only one GPU supports it but the other not or similar issues).
  [For your own risk. Better is to disable such GPUs, e.g. via
  ROCR_VISIBLE_DEVICES, but it might still be useful at times.]

Tobias

PS: The APU check was tried with MI210 (false) and MI300A (true).
For the latter, both HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT
and HSA_AMD_SYSTEM_INFO_XNACK_ENABLED are only true if HSA_XNACK=1
was set. [The HSA_AMD_AGENT_INFO_SVM_DIRECT_HOST_ACCESS flag seems
to go in lockstep with HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU, except
that the former is unsurprisingly always true for the CPUs.]

For Grace-Hopper, CU_DEVICE_ATTRIBUTE_INTEGRATED = 0. I think GH
is currently the most integrated GPU by Nvidia and still uses
separate memory controllers (albeit a fast interconnect). [I ignore
embedded CPU/GPUs that Nvidia also offers.] Possibly, the Intel +
Nvidia collaboration will yield a CPU+GPU system for which the
flag will be true.
libgomp: Add is_integrated_apu function to plugin/plugin-{gcn,nvptx}.c

The added function is currently '#if 0' but is planned to be used to enable
self mapping automatically. Prerequisite for auto self maps is still mapping
'declare target' variables (if any, in libgomp) or converting all
'declare target' variables to 'declare target link' in the compiler
(as required for 'omp requires self_maps').

include/ChangeLog:

	* hsa_ext_amd.h (enum hsa_amd_agent_info_s): Add
	HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES.
	(enum): Add HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (is_integrated_apu): New; currently '#if 0'.
	* plugin/plugin-nvptx.c (is_integrated_apu): Likewise.

 include/hsa_ext_amd.h         | 10 +++++++-
 libgomp/plugin/plugin-gcn.c   | 58 +++++++++++++++++++++++++++++++++++++++++++
 libgomp/plugin/plugin-nvptx.c | 16 ++++++++++++
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/include/hsa_ext_amd.h b/include/hsa_ext_amd.h
index c1c16536621..e29e88090eb 100644
--- a/include/hsa_ext_amd.h
+++ b/include/hsa_ext_amd.h
@@ -168,9 +168,17 @@ typedef enum hsa_amd_agent_info_s {
    * selective workarounds for hardware errata.
    * The type of this attribute is uint32_t.
    */
-  HSA_AMD_AGENT_INFO_ASIC_REVISION = 0xA012
+  HSA_AMD_AGENT_INFO_ASIC_REVISION = 0xA012,
+
+  /* Bitmask with memory properties of the agent.  */
+  HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES = 0xA114
 } hsa_amd_agent_info_t;
 
+
+enum {
+  HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU = (1 << 0)
+};
+
 typedef struct hsa_amd_hdp_flush_s {
   uint32_t* HDP_MEM_FLUSH_CNTL;
   uint32_t* HDP_REG_FLUSH_CNTL;
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 18f01e09002..97249335a03 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3331,6 +3331,59 @@ gcn_exec (struct kernel_info *kernel,
 /* }}}  */
 /* {{{ Generic Plugin API  */
 
+#if 0  /* TODO: Use to enable self-mapping/USM automatically.  */
+
+/* Return TRUE if the GPU is an APU, i.e. the GPU is integrated with the CPU
+   such that both use the same memory controller such that mapping or memory
+   migration is pointless.  If CHECK_XNACK is TRUE, it additionally requires
+   that the GPU has *no* XNACK support otherwise FALSE is returned.
+
+   In theory, enabling unified-shared memory for APUs should always work,
+   however, with AMD GPUs some APUs (e.g. MI300A) still require XNACK to be
+   enabled as it is required to handle page faults.
+
+   Thus, for unified-shared memory access, either of the following must hold:
+   * HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is TRUE
+     This implies that all GPUs support USM access, either directly (as APU)
+     or via page migration.  For MI300A, this is only the case if
+     HSA_AMD_SYSTEM_INFO_XNACK_ENABLED is TRUE.
+   * If the GPU an APU *and* it does not support XNACK.  */
+
+static bool
+is_integrated_apu (struct agent_info *agent, bool check_xnack)
+{
+  enum {
+    HSACO_ATTR_UNSUPPORTED,
+    HSACO_ATTR_OFF,
+    HSACO_ATTR_ON,
+    HSACO_ATTR_ANY,
+    HSACO_ATTR_DEFAULT
+  };
+
+  bool is_apu;
+  uint8_t mem_prop[8];
+  hsa_status_t status;
+
+  status = hsa_fns.hsa_agent_get_info_fn (
+	     agent->id, (hsa_agent_info_t) HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES,
+	     mem_prop);
+  _Static_assert (HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU < 8,
+		  "HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU < 8");
+  is_apu = (status == HSA_STATUS_SUCCESS
+	    && (mem_prop[0] & (1 << HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU)));
+
+  if (check_xnack)
+    switch(agent->device_isa)
+      {
+#define GCN_DEVICE(name, NAME, ELF, ISA, XNACK, ...) \
+      case ELF: return is_apu && (XNACK == HSACO_ATTR_UNSUPPORTED);
+#include "../../gcc/config/gcn/gcn-devices.def"
+      default: return false;  /* Just to be save.  */
+      }
+  return is_apu;
+}
+#endif
+
 /* Return the name of the accelerator, which is "gcn".  */
 
 const char *
@@ -3417,6 +3470,11 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
       if (status != HSA_STATUS_SUCCESS)
 	GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT "
 			   "failed");
+      /* TODO: In principle, iterating over the GPUs and calling
+	 'is_integrated_apu (agent, true)' would work on mixed USM/non-USM
+	 systems to find USM-supporting devices - but, currently, capabilities
+	 are per device type (GCN, NVPTX, ...) not per device/agent
+         or per ISA.  */
       if (!b)
 	return -1;
     }
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index eb7b5e59d8f..b024e3d3568 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1246,6 +1246,22 @@ nvptx_get_current_cuda_context (void)
   return nvthd->ptx_dev->ctx;
 }
 
+#if 0  /* TODO: Use to enable self-mapping/USM automatically.  */
+
+/* Return TRUE if the GPU is integrated with host memory, i.e. GPU and
+   host share the same memory controller.  As of Oct 2025, no such
+   Nvidia GPU seems to exist.  */
+static bool
+is_integrated_apu (struct ptx_device *ptx_dev)
+{
+  int pi;
+  CUresult r;
+  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
+			 CU_DEVICE_ATTRIBUTE_INTEGRATED, ptx_dev->dev);
+  return (r == CUDA_SUCCESS && pi == 1);
+}
+#endif
+
 /* Plugin entry points.  */
 
 const char *

Reply via email to