Re: [PATCH] libgomp, amdgcn, nvptx: Improve omp_target_is_accessible [PR121813]

Andrew Stubbs Mon, 08 Dec 2025 08:46:07 -0800

On 05/12/2025 13:50, Tobias Burnus wrote:

Andrew Stubbs wrote:

On 28/11/2025 12:16, Andrew Stubbs wrote:
This patch extends omp_target_is_accessible to check the actualdevice statusfor the memory region, on amdgcn and nvptx devices (rather than justchecking
if shared memory is enabled).


OpenMP 6.0 has the following clarification:

"If ptr is NULL or the implementation cannot guarantee
  accessibility, the routine returns zero."

Can you add at the top of target.c's omp_target_is_accessible

   if (ptr == NULL)
     return false;

or maybe also  'if (ptr == NULL || size == 0)'; the specification
requires that size is positive.

Could be then tested as:

for (int dev = 0; dev < omp_get_num_devices(); dev++)
   if (omp_is_accessible (nullptr, dev))
     __builtin_abort ();

That's PR 113213.  (Which is not listed + would be fixed by this).


Done.


[Talking of which, there is also PR 113216, which is very similar
to PR121813.]

* * *

-  /* TODO: Unified shared memory must be handled when available.  */
+  /* Unified shared memory (or true shared memory).  */
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    return true;


I think it makes more sense to remove the check here - and
handle it in the plugin. (More to this later.)

We still need it for the case where the plugin does not provide the"is_shared_ptr" API (maybe there's one out there?), so I've simplyswapped them around, so the plugin call comes first, and made it so that"false" from the plugin does not mean "check for other opinions".

* * *

Can you also update libgomp.texi,
https://gcc.gnu.org/onlinedocs/libgomp/omp_005ftarget_005fis_005faccessible.html
(a) adding the NULL / not garantee note.

(b) I think the following should be replaced by
the assumption that for device == host, all
pointers are accessible.
[We should do a better check eventually,
but presumably in a later patch; cf. below]
"Note that GCC’s current implementation assumes that ptr is a valid hostpointer. Therefore, all addresses given by ptr are assumed to beaccessible on the initial device. And, to err on the safe side, thismemory is only available on a non-host device that can access all hostmemory ([uniform] shared memory access)."


Done.

* * *

BTW: OpenMP 6 also removed "When called from within a target
region the effect is unspecified.", i.e. it is now device-callable.

Something to address eventually but IMHO there is no need to do
so as part of this patch.

* * *

+++ b/libgomp/libgomp-plugin.h
@@ -175,0 +176 @@ extern bool GOMP_OFFLOAD_managed_free (int, void *);
+extern bool GOMP_OFFLOAD_is_shared_ptr (int, const void *, size_t);


I think 'is_accessible_ptr' makes more sense than to talk
about 'is_shared_ptr' as that's what it is about.


Done.

Additionally, for future use, I'd prefer an 'int' (or enum)
instead of a 'bool' here:

'-1' Known to be accessible by that device but *not* host accessible
'1'  Known to be accessible by that device
      (and either also known to be host accessible or unknown host
      accessibility status)
'2'  Known to be device accessible if it is host accessible,
      the device can access host memory (USM).
'0'  Not device accessible / unknown

I have now implemented this so that positive values are "accessible"(possibly with extra information), and zero or negative values are"inaccessible" (possibly with a reason or caveat given).


So, 0 and 1 are clear,

Possible negative responses:

* "accessible only if extra checks pass", but assume inaccessibleotherwise. (Your "2" above).

 * "partially, but not wholly accessible".

Possible extra positive responses:
 * "accessible to device, but only device", which is ignorable information.
 * "accessible but slow/not recommended"

This seems more future proof than your scheme above?

The reasons is that we eventually want to handle:

  ptr = omp_target_alloc (nbytes, omp_default_device);
  int a = omp_target_is_accessible (ptr, omp_initial_device);

and

  ptr = omp_target_alloc (nbytes,  5 );
  int a = omp_target_is_accessible (ptr, 3 );

i.e. checking on the host for device-only allocated memory
and checking on the device for memory allocated on a different
device.

For the host or if the device returned '2', we have then to
walk all nonhost devices (but the selected device) to check
whether any of them returns '-1'; if not, we assume that it
is accessible.

* * *

Thus, I think it makes sense to have an 'int'. However,
looking at CUDA, I realize that for one check, we could
use CU_POINTER_ATTRIBUTE_DEVICE_POINTER and for the other
CU_POINTER_ATTRIBUTE_HOST_POINTER.

Thus, we presumably would need a boolean flag whether the
check should be done for device or host accessibility.
But presumably, we still want to stick to an int:
- unknown status
- known to be accessible by the host | this device
- known not to be accessible by the host | this device

Hence: consider preparing for this by adding a boolean
and returning an int, even if we don't implement this right now.

* * *

BTW: I think we want to add and support the following
testcase, which works with Nvidia but not with AMD:

-----------------------------------
#include <omp.h>

void check (int dev)
{
   constexpr int N = 10;
   constexpr int size = N*sizeof(int);
   int A[N] = {};

   void *ptr = omp_target_alloc (size, dev);

   if (ptr == nullptr || !omp_target_is_accessible (ptr, size, dev))
     __builtin_abort ();

   #pragma omp target device(dev) firstprivate(ptr)
   for (int i = 0; i < N; i++)
     ((int *)ptr)[i] = i + 1;

if (omp_target_memcpy (A, ptr, size, 0, 0, omp_initial_device, dev) != 0)

     __builtin_abort ();

   for (int i = 0; i < N; i++)
     if (A[i] != i + 1)
       __builtin_abort ();

   omp_target_free (ptr, dev);
}

int main ()
{
   check (omp_default_device);
   for (int dev = 0; dev <= omp_get_num_devices(); dev++)
     check (dev);
}
-----------------------------------

This test lacks any comments to tell me what it does or which parts aresignificant. Is the omp_target_memcpy somehow different to the mappingsin the other testcases?

* * *

Actually, I think the AMD example can be made working using
the following patch.

Interestingly, nagents == 0, but the first check is successful:

$1 = {size = 56, type = HSA_EXT_POINTER_TYPE_HSA, agentBaseAddress =0x7ffff4800000, hostBaseAddress = 0x7ffff4800000, sizeInBytes = 40,userData = 0x0, agentOwner = {handle = 5030896}, global_flags = 4}


i.e. the size fits and the owner is the current agent.

I think we should consider of either using this one instead or
in addition to the existing check.

Namely something like the following.
[Disclaimer: I have not checked the latest change I did,
not even compiled it - but the core part worked when I
tested it.]

+++ b/libgomp/plugin/plugin-gcn.c
@@ -238,2 +238,5 @@ struct hsa_runtime_fn_info
       size_t attribute_count);
+  hsa_status_t (*hsa_amd_pointer_info_fn)
+    (const void *, hsa_amd_pointer_info_t *, void *(*)(size_t),
+     uint32_t *, hsa_agent_t **);
  };
@@ -1500,2 +1503,3 @@ init_hsa_runtime_functions (void)
    DLSYM_OPT_FN (hsa_amd_svm_attributes_get)
+  DLSYM_OPT_FN (hsa_amd_pointer_info)
    return true;
@@ -3505,3 +3509,3 @@ gcn_exec (struct kernel_info *kernel,

-#if 0  /* TODO: Use to enable self-mapping/USM automatically.  */
+

/* FIXME: The auto-self-map feature depends on still mapping 'declaretarget'@@ -3558,3 +3562,2 @@ is_integrated_apu (struct agent_info *agent, boolcheck_xnack)

  }
-#endif

@@ -5269,3 +5272,4 @@ GOMP_OFFLOAD_is_shared_ptr (int device, const void*ptr, size_t size)

        || device < 0 || device > hsa_context.agent_count
-      || !hsa_fns.hsa_amd_svm_attributes_get_fn)
+      || (!hsa_fns.hsa_amd_svm_attributes_get_fn
+      && !hsa_fns.hsa_amd_pointer_info_fn))
      return false;

@@ -5274,2 +5278,37 @@ GOMP_OFFLOAD_is_shared_ptr (int device, constvoid *ptr, size_t size)


+  if (hsa_fns.hsa_amd_pointer_info_fn)
+    {
+      hsa_amd_pointer_info_t info;
+      uint32_t nagents;
+      hsa_agent_t *agents;
+      info.size = sizeof (hsa_amd_pointer_info_t);
+

+ hsa_status_t status = hsa_fns.hsa_amd_pointer_info_fn (ptr,&info, NULL,

+                                 &nagents, &agents);

+ if (status2 == HSA_STATUS_SUCCESS && info.type !=HSA_EXT_POINTER_TYPE_UNKNOWN)

+   {
+     /* Owns the pointer; can be true even for nagents == 0.  */
+     if (agent->id.handle == info.agentOwner.handle)
+       return info.sizeInBytes >= size;
+     for (unsigned i = 0; i < nagents; i++)
+       if (agent->id.handle == agents[0].handle)
+     return info.sizeInBytes >= size;
+     if (info.type != HSA_EXT_POINTER_TYPE_LOCKED)
+       return false;  // Not host memory and belonging to other agents.
+   }
+
+  /* Assume memory is host accessible.  */
+  bool svm_accessible;
+  hsa_system_info_t type = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;

+ hsa_status_t status2 = hsa_fns.hsa_system_get_info_fn (type,&svm_accessible);

+  if (status2 == HSA_STATUS_SUCCESS && svm_accessible)
+    return true;
+  if (is_integrated_apu (agent, /* xnack */ true
+    /* FIXME: pass !(HSA_AMD_SYSTEM_INFO_XNACK_ENABLED) status here? */ ))
+    return true;
+
+  /* FIXME: Will the following provide additional 'true' cases or not?  */
+  if (!hsa_fns.hsa_amd_svm_attributes_get_fn)
+    return false;
+

/* The HSA API doesn't seem to report for the whole range given, sowe call

---------------------------------------------------


I think we're running into feature creep here.

* * *

For Nvidia, while it somehow works:

(A) I think we should run it on the right device,
i.e.
   CUcontext old_ctx;
   CUDA_CALL_ERET (false, cuCtxPushCurrent, ptx_dev->ctx);
   ....
   CUDA_CALL_ASSERT (cuCtxPopCurrent, &old_ctx);

(B) I wonder whether it shouldn't be instead:

   CU_POINTER_ATTRIBUTE_DEVICE_POINTER

I assume that this will also deal with USM, but I have not
checked whether additionally a USM check would make sense,
similar to the AMD part above or whether that's already
covered that way.


Likewise.

* * *

Otherwise, nothing spotted, but I still want to reread the
patch.


v3 patch is attached.

Andrew

From 122a73bcc904e1ec3931d84f589838760369c76e Mon Sep 17 00:00:00 2001
From: Andrew Stubbs <[email protected]>
Date: Mon, 8 Dec 2025 16:18:02 +0000
Subject: [PATCH v3] libgomp, amdgcn, nvptx: Improve omp_target_is_accessible
 [PR121813]

This patch extends omp_target_is_accessible to check the actual device status
for the memory region, on amdgcn and nvptx devices (rather than just checking
if shared memory is enabled).

In both cases, we check the status of each 4k region within the given memory
range (assuming 4k pages should be safe for all the currently supported hosts)
and returns true if all of the pages report accessible.

The testcases have been modified to check that allocations marked accessible
actually are accessible (inaccessibility can't be checked without invoking
memory faults), and to understand that some parts of an array can be accessible
but other parts not (I have observed this intermittently for the stack memory
on amdgcn using the Fortran testcase, which can have the allocation span pages).

There's also new testcases for the various other memory modes, and for managed
memory.

include/ChangeLog:

	* cuda/cuda.h (CUpointer_attribute): New enum.
	(cuPointerGetAttribute): New prototype.

libgomp/ChangeLog:

	PR libgomp/121813
	* libgomp-plugin.h (GOMP_OFFLOAD_is_accessible_ptr): New prototype.
	* libgomp.h
	(struct gomp_device_descr): Add GOMP_OFFLOAD_is_accessible_ptr.
	* libgomp.texi: Update omp_target_is_accessible docs.
	* plugin/cuda-lib.def (cuPointerGetAttribute): New entry.
	* plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add
	hsa_amd_svm_attributes_get_fn.
	(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_get.
	(GOMP_OFFLOAD_is_accessible_ptr): New function.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_is_accessible_ptr): Likewise.
	* target.c (omp_target_is_accessible): Call is_accessible_ptr_func.
	(gomp_load_plugin_for_device): Add is_accessible_ptr.
	* testsuite/lib/libgomp.exp (check_effective_target_omp_usm): New.
	* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Rework
	to match more details of the GPU implementation.
	* testsuite/libgomp.fortran/target-is-accessible-1.f90: Likewise.
	* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-3.c: New test.
	* testsuite/libgomp.c-c++-common/target-is-accessible-4.c: New test.
---
 include/cuda/cuda.h                           |  9 ++
 libgomp/libgomp-plugin.h                      |  1 +
 libgomp/libgomp.h                             |  1 +
 libgomp/libgomp.texi                          | 14 ++-
 libgomp/plugin/cuda-lib.def                   |  1 +
 libgomp/plugin/plugin-gcn.c                   | 50 ++++++++++
 libgomp/plugin/plugin-nvptx.c                 | 64 +++++++++++--
 libgomp/target.c                              | 18 +++-
 libgomp/testsuite/lib/libgomp.exp             | 23 +++++
 .../target-is-accessible-1.c                  | 77 +++++++++++++--
 .../target-is-accessible-2.c                  |  5 +
 .../target-is-accessible-3.c                  |  4 +
 .../target-is-accessible-4.c                  | 28 ++++++
 .../target-is-accessible-1.f90                | 93 +++++++++++++++----
 14 files changed, 347 insertions(+), 41 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-2.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-3.c
 create mode 100644 libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-4.c

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 28510a3150c..ef62f150a5d 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -143,6 +143,13 @@ typedef enum {
   CU_MEMORYTYPE_UNIFIED = 0x04
 } CUmemorytype;
 
+typedef enum {
+  CU_POINTER_ATTRIBUTE_CONTEXT = 0x01,
+  CU_POINTER_ATTRIBUTE_MEMORY_TYPE = 0x02,
+  CU_POINTER_ATTRIBUTE_DEVICE_POINTER = 0x03,
+  CU_POINTER_ATTRIBUTE_HOST_POINTER = 0x04
+} CUpointer_attribute;
+
 typedef struct {
   size_t srcXInBytes, srcY;
   CUmemorytype srcMemoryType;
@@ -300,6 +307,8 @@ CUresult cuModuleGetGlobal (CUdeviceptr *, size_t *, CUmodule, const char *);
 CUresult cuModuleLoad (CUmodule *, const char *);
 CUresult cuModuleLoadData (CUmodule *, const void *);
 CUresult cuModuleUnload (CUmodule);
+CUresult cuPointerGetAttribute (CUmemorytype *, CUpointer_attribute,
+				CUdeviceptr);
 CUresult cuOccupancyMaxPotentialBlockSize(int *, int *, CUfunction,
 					  CUoccupancyB2DSize, size_t, int);
 typedef void (*CUstreamCallback)(CUstream, CUresult, void *);
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 71e74527e71..ad01d37ff35 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -173,6 +173,7 @@ extern void *GOMP_OFFLOAD_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_free (int, void *);
 extern void *GOMP_OFFLOAD_managed_alloc (int, size_t);
 extern bool GOMP_OFFLOAD_managed_free (int, void *);
+extern int GOMP_OFFLOAD_is_accessible_ptr (int, const void *, size_t);
 extern bool GOMP_OFFLOAD_page_locked_host_alloc (void **, size_t);
 extern bool GOMP_OFFLOAD_page_locked_host_free (void *);
 extern bool GOMP_OFFLOAD_dev2host (int, void *, const void *, size_t);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 46db7d41f32..0496c2a803e 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1425,6 +1425,7 @@ struct gomp_device_descr
   __typeof (GOMP_OFFLOAD_free) *free_func;
   __typeof (GOMP_OFFLOAD_managed_alloc) *managed_alloc_func;
   __typeof (GOMP_OFFLOAD_managed_free) *managed_free_func;
+  __typeof (GOMP_OFFLOAD_is_accessible_ptr) *is_accessible_ptr_func;
   __typeof (GOMP_OFFLOAD_page_locked_host_alloc) *page_locked_host_alloc_func;
   __typeof (GOMP_OFFLOAD_page_locked_host_free) *page_locked_host_free_func;
   __typeof (GOMP_OFFLOAD_dev2host) *dev2host_func;
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index ac96d2f29c7..602b1d0469e 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -2171,13 +2171,17 @@ and extending @var{size} bytes, is accessibly on the device specified by
 @var{device_num}.  If so, it returns a nonzero value and otherwise zero.
 
 The address given by @var{ptr} is interpreted to be in the address space of
-the device and @var{size} must be positive.
+the device and @var{size} must be positive.  NULL pointers and zero-length
+ranges always return zero.
 
 Note that GCC's current implementation assumes that @var{ptr} is a valid host
-pointer. Therefore, all addresses given by @var{ptr} are assumed to be
-accessible on the initial device. And, to err on the safe side, this memory
-is only available on a non-host device that can access all host memory
-([uniform] shared memory access).
+pointer. Therefore, all non-NULL addresses given by @var{ptr} are assumed to be
+accessible on the initial device.  The address is only reported as accessible
+on non-host devices if this is @emph{known} to be the case, or if the device
+reports that all memory is accessible (i.e. [unified] shared memory access).
+If the runtime is uncertain it may report accessible memory as inaccessible.
+For a memory range to be reported accessible, the whole range must be known to
+be accessible.
 
 Running this routine in a @code{target} region except on the initial device
 is not supported.
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 67c783d8566..f87987db85c 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -55,6 +55,7 @@ CUDA_ONE_CALL (cuModuleLoad)
 CUDA_ONE_CALL (cuModuleLoadData)
 CUDA_ONE_CALL (cuModuleUnload)
 CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
+CUDA_ONE_CALL (cuPointerGetAttribute)
 CUDA_ONE_CALL (cuStreamAddCallback)
 CUDA_ONE_CALL (cuStreamCreate)
 CUDA_ONE_CALL (cuStreamDestroy)
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 92de6fb1b64..56b9934129a 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -233,6 +233,9 @@ struct hsa_runtime_fn_info
   hsa_status_t (*hsa_amd_svm_attributes_set_fn)
     (void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list,
      size_t attribute_count);
+  hsa_status_t (*hsa_amd_svm_attributes_get_fn)
+    (void* ptr, size_t size, hsa_amd_svm_attribute_pair_t* attribute_list,
+     size_t attribute_count);
 };
 
 /* As an HIP runtime is dlopened, following structure defines function
@@ -1494,6 +1497,7 @@ init_hsa_runtime_functions (void)
   DLSYM_OPT_FN (hsa_amd_memory_unlock)
   DLSYM_OPT_FN (hsa_amd_memory_async_copy_rect)
   DLSYM_OPT_FN (hsa_amd_svm_attributes_set)
+  DLSYM_OPT_FN (hsa_amd_svm_attributes_get)
   return true;
 #undef DLSYM_OPT_FN
 #undef DLSYM_FN
@@ -5258,6 +5262,52 @@ GOMP_OFFLOAD_managed_free (int device, void *ptr)
   return true;
 }
 
+int
+GOMP_OFFLOAD_is_accessible_ptr (int device, const void *ptr, size_t size)
+{
+  if (!init_hsa_context (false)
+      || device < 0 || device > hsa_context.agent_count
+      || !hsa_fns.hsa_amd_svm_attributes_get_fn)
+    return 0;
+
+  struct agent_info *agent = get_agent_info (device);
+
+  /* The HSA API doesn't seem to report for the whole range given, so we call
+     once for each page the range straddles.  */
+  const void *p = ptr;
+  size_t remaining = size;
+  do
+    {
+      /* Note: the access query returns in the attribute field.  */
+      struct hsa_amd_svm_attribute_pair_s attr = {
+	HSA_AMD_SVM_ATTRIB_ACCESS_QUERY, agent->id.handle
+      };
+      hsa_status_t status = hsa_fns.hsa_amd_svm_attributes_get_fn ((void*)p,
+								   remaining,
+								   &attr, 1);
+      if (status != HSA_STATUS_SUCCESS)
+	/* This shouldn't happen, but if it does we can't say the memory is
+	   accessible.  */
+	return 0;
+
+      switch (attr.attribute)
+	{
+	case HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE:
+	case HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE:
+	  break;
+	case HSA_AMD_SVM_ATTRIB_AGENT_NO_ACCESS:
+	default:
+	  return 0;
+	}
+
+      p = (void*)(((uintptr_t)p + 4096) & ~0xfffUL);
+      remaining = size - ((uintptr_t)p - (uintptr_t)ptr);
+    } while (p < ptr + size);
+
+  /* All pages were accessible.  */
+  return 1;
+}
+
 /* }}} */
 /* {{{ OpenACC Plugin API  */
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index dd8bcf9c507..194a3957cfa 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -353,6 +353,8 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+static bool using_usm = false;
+
 /* "Native" GPU thread stack size.  */
 static unsigned native_gpu_thread_stack_size = 0;
 
@@ -1343,15 +1345,20 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
       && (omp_requires_mask
 	  & (GOMP_REQUIRES_UNIFIED_SHARED_MEMORY | GOMP_REQUIRES_SELF_MAPS)))
-    for (int dev = 0; dev < num_devices; dev++)
-      {
-	int pi;
-	CUresult r;
-	r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
-			       CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS, dev);
-	if (r != CUDA_SUCCESS || pi == 0)
-	  return -1;
-      }
+    {
+      for (int dev = 0; dev < num_devices; dev++)
+	{
+	  int pi;
+	  CUresult r;
+	  r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, &pi,
+				 CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS,
+				 dev);
+	  if (r != CUDA_SUCCESS || pi == 0)
+	    return -1;
+	}
+
+      using_usm = true;
+    }
   return num_devices;
 }
 
@@ -1906,6 +1913,45 @@ GOMP_OFFLOAD_managed_free (int ord, void *ptr)
   return GOMP_OFFLOAD_free (ord, ptr);
 }
 
+int
+GOMP_OFFLOAD_is_accessible_ptr (int device __attribute__((unused)),
+				const void *ptr, size_t size)
+{
+  /* USM implies access.  */
+  if (using_usm)
+    return 1;
+
+  /* The Cuda API does not permit testing a whole range, so we test each
+     4K page within the range.  If any page is inaccessible return false.  */
+  const void *p = ptr;
+  do
+    {
+      CUmemorytype mem_type;
+      CUresult res = CUDA_CALL_NOCHECK (cuPointerGetAttribute, &mem_type,
+					CU_POINTER_ATTRIBUTE_MEMORY_TYPE,
+					(CUdeviceptr)p);
+      if (res != CUDA_SUCCESS)
+	/* Memory is not registered, and therefore not accessible.  */
+	return 0;
+
+      switch (mem_type)
+	{
+	case CU_MEMORYTYPE_HOST:
+	case CU_MEMORYTYPE_UNIFIED:
+	case CU_MEMORYTYPE_DEVICE:
+	  break;
+	case CU_MEMORYTYPE_ARRAY:
+	default:
+	  return 0;
+	}
+
+      p = (void*)(((uintptr_t)p + 4096) & ~0xfffUL);
+    } while (p < ptr + size);
+
+  /* All pages were accessible.  */
+  return 1;
+}
+
 bool
 GOMP_OFFLOAD_page_locked_host_alloc (void **ptr, size_t size)
 {
diff --git a/libgomp/target.c b/libgomp/target.c
index af7c702d439..896bedfd172 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -5590,6 +5590,9 @@ omp_get_mapped_ptr (const void *ptr, int device_num)
 int
 omp_target_is_accessible (const void *ptr, size_t size, int device_num)
 {
+  if (ptr == NULL || size == 0)
+    return false;
+
   if (device_num == omp_default_device)
     device_num = gomp_get_default_device ();
 
@@ -5601,9 +5604,19 @@ omp_target_is_accessible (const void *ptr, size_t size, int device_num)
   if (devicep == NULL)
     return false;
 
-  /* TODO: Unified shared memory must be handled when available.  */
+  /* Managed memory (or other device feature).
+     is_accessible_ptr may, in future, report more than simply true or false,
+     but we can assume that positive responses are accessible, and
+     zero/negative responses are inaccessible.  */
+  if (devicep->is_accessible_ptr_func)
+    return (devicep->is_accessible_ptr_func (devicep->target_id, ptr, size)
+	    > 0);
+
+  /* Unified shared memory (or true shared memory).  */
+  if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
+    return true;
 
-  return devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM;
+  return false;
 }
 
 int
@@ -6009,6 +6022,7 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   DLSYM (free);
   DLSYM_OPT (managed_alloc, managed_alloc);
   DLSYM_OPT (managed_free, managed_free);
+  DLSYM_OPT (is_accessible_ptr, is_accessible_ptr);
   DLSYM_OPT (page_locked_host_alloc, page_locked_host_alloc);
   DLSYM_OPT (page_locked_host_free, page_locked_host_free);
   DLSYM (dev2host);
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 076b775560f..cce2e93f857 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -725,6 +725,29 @@ int main() {
 } } "-lhipblas" ]
 }
 
+# return 1 if OpenMP Unified Shared Memory is supported by offload devices
+
+proc check_effective_target_omp_usm { } {
+    if { [check_effective_target_offload_device_nvptx] 
+         || [check_effective_target_offload_target_amdgcn] } {
+	if [check_runtime usm_available_ {
+	    #include <omp.h>
+	    #pragma omp requires unified_shared_memory
+	    int main ()
+	    {
+	      int a;
+	      #pragma omp target map(from: a)
+		a = omp_is_initial_device ();
+	      return a;
+	    }
+	} ] {
+	  return 1
+	}
+    }
+
+    return 0
+}
+
 # return 1 if OpenMP Device Managed Memory is supported
 
 proc check_effective_target_omp_managedmem { } {
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c
index 71d6b2a8360..fff590e02fd 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c
@@ -1,4 +1,8 @@
 #include <omp.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+//#define __builtin_abort() __builtin_printf ("fail at line %d\n", __LINE__)
 
 int
 main ()
@@ -6,7 +10,7 @@ main ()
   int d = omp_get_default_device ();
   int id = omp_get_initial_device ();
   int n = omp_get_num_devices ();
-  void *p;
+  int *p = (int*)malloc (sizeof (int));
 
   if (d < 0 || d >= n)
     d = id;
@@ -26,24 +30,81 @@ main ()
   if (omp_target_is_accessible (p, sizeof (int), n + 1))
     __builtin_abort ();
 
-  /* Currently, a host pointer is accessible if the device supports shared
-     memory or omp_target_is_accessible is executed on the host. This
-     test case must be adapted when unified shared memory is avialable.  */
   int a[128];
   for (int d = 0; d <= omp_get_num_devices (); d++)
     {
+      if (omp_target_is_accessible (NULL, 1, d))
+	__builtin_abort ();
+
+      if (omp_target_is_accessible (p, 0, d))
+	__builtin_abort ();
+
+      /* Check if libgomp is treating the device as a shared memory device.  */
       int shared_mem = 0;
       #pragma omp target map (alloc: shared_mem) device (d)
 	shared_mem = 1;
+
+      int heap_accessible = shared_mem;
       if (omp_target_is_accessible (p, sizeof (int), d) != shared_mem)
-	__builtin_abort ();
+	{
+	  if (shared_mem)
+	    __builtin_abort ();
+
+	  /* shared_mem is false, but the memory is reading as accessible,
+	     so let's check that by reading it.  We should not do so
+	     unconditionally because if it's wrong then we'll probably get
+	     a memory fault.  */
+	  *p = 123;
+	  uintptr_t addr = (uintptr_t)p;
 
+	  #pragma omp target is_device_ptr(p) map(from:heap_accessible) \
+			     device(d)
+	    {
+	      if ((uintptr_t)p == addr && *p == 123)
+		heap_accessible = 1;
+	    }
+
+	  if (!heap_accessible)
+	    __builtin_abort ();
+	}
+
+      int stack_accessible = shared_mem;
       if (omp_target_is_accessible (a, 128 * sizeof (int), d) != shared_mem)
-	__builtin_abort ();
+	{
+	  if (shared_mem)
+	    __builtin_abort ();
+
+	  /* shared_mem is false, but the memory is reading as accessible,
+	     so let's check that by reading it.  We should not do so
+	     unconditionally because if it's wrong then we'll probably get
+	     a memory fault.  */
+	  int test_accessible = 123;
+	  uintptr_t addr = (uintptr_t)&test_accessible;
 
+	  #pragma omp target has_device_addr(test_accessible) \
+	                     map(from:stack_accessible) device(d)
+	    {
+	      if ((uintptr_t)&test_accessible == addr
+		  && test_accessible == 123)
+		stack_accessible = 1;
+	    }
+
+	  if (!stack_accessible)
+	    __builtin_abort ();
+	}
+      __builtin_printf ("device #%d: shared_mem=%d heap_accessible=%d "
+			"stack_accessible=%d\n",
+			d, shared_mem, heap_accessible, stack_accessible);
+
+      /* omp_target_is_accessible returns false if *any* of the array is
+	 inaccessible, so we only check the aggregate result.
+	 (Varying access observed on amdgcn without xnack.)  */
+      bool accessible = true;
       for (int i = 0; i < 128; i++)
-	if (omp_target_is_accessible (&a[i], sizeof (int), d) != shared_mem)
-	  __builtin_abort ();
+	if (!omp_target_is_accessible (&a[i], sizeof (int), d))
+	  accessible = false;
+      if (accessible != (shared_mem || stack_accessible))
+	__builtin_abort ();
     }
 
   return 0;
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-2.c
new file mode 100644
index 00000000000..205c0a41014
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-2.c
@@ -0,0 +1,5 @@
+/* { dg-require-effective-target omp_usm } */
+
+#pragma omp requires unified_shared_memory
+
+#include "target-is-accessible-1.c"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-3.c b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-3.c
new file mode 100644
index 00000000000..4e601b930c2
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-3.c
@@ -0,0 +1,4 @@
+/* { dg-require-effective-target offload_target_amdgcn_with_xnack } */
+/* { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mxnack=on" } */
+
+#include "target-is-accessible-1.c"
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-4.c b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-4.c
new file mode 100644
index 00000000000..0478fb0134f
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-4.c
@@ -0,0 +1,28 @@
+/* { dg-require-effective-target omp_managedmem } */
+
+#include <omp.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+//#define __builtin_abort() __builtin_printf ("fail at line %d\n", __LINE__)
+
+int
+main ()
+{
+  int *p = (int*)omp_alloc (sizeof (int), ompx_gnu_managed_mem_alloc);
+
+  *p = 42;
+  uintptr_t a_p = (uintptr_t)p;
+
+  #pragma omp target is_device_ptr(p)
+    {
+      if (*p != 42 || a_p != (uintptr_t)p)
+	__builtin_abort ();
+    }
+  if (!p
+      || !omp_target_is_accessible (p, sizeof (int),
+				    omp_get_default_device ()))
+    __builtin_abort ();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90 b/libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90
index 2c25dca22f2..4cb401b18e4 100644
--- a/libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90
@@ -1,53 +1,112 @@
+! { dg-do run }
+
 program main
   use omp_lib
   use iso_c_binding
   implicit none (external, type)
-  integer :: d, id, n, shared_mem, i
+  integer :: d, id, n, shared_mem, i, heap_accessible, stack_accessible
+  integer, target :: test_accessible
+  integer, allocatable, target :: p(:)
   integer, target :: a(1:128)
-  type(c_ptr) :: p
+  integer(c_intptr_t) :: addr
+  logical :: condition
 
   d = omp_get_default_device ()
   id = omp_get_initial_device ()
   n = omp_get_num_devices ()
+  allocate (p(1))
 
   if (d < 0 .or. d >= n) &
     d = id
 
-  if (omp_target_is_accessible (p, c_sizeof (d), n) /= 1) &
+  if (omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), n) == 0) &
     stop 1
 
-  if (omp_target_is_accessible (p, c_sizeof (d), id) /= 1) &
+  if (omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), id) == 0) &
     stop 2
 
-  if (omp_target_is_accessible (p, c_sizeof (d), omp_initial_device) /= 1) &
+  if (omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), omp_initial_device) == 0) &
     stop 3
 
-  if (omp_target_is_accessible (p, c_sizeof (d), -6) /= 0) &  ! -6 = omp_default_device - 1
+  if (omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), -6) /= 0) &  ! -6 = omp_default_device - 1
     stop 4
 
-  if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) &
+  if (omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), n + 1) /= 0) &
     stop 5
 
   ! Currently, a host pointer is accessible if the device supports shared
   ! memory or omp_target_is_accessible is executed on the host. This
-  ! test case must be adapted when unified shared memory is avialable.
+  ! test case must be adapted when unified shared memory is available.
   do d = 0, omp_get_num_devices ()
-    shared_mem = 0;
+    ! Check if libgomp is treating the device as a shared memory device.
+    shared_mem = 0
     !$omp target map (alloc: shared_mem) device (d)
-      shared_mem = 1;
+      shared_mem = 1
     !$omp end target
 
-    if (omp_target_is_accessible (p, c_sizeof (d), d) /= shared_mem) &
-      stop 6;
+    heap_accessible = shared_mem
+    condition = omp_target_is_accessible (c_loc(p), c_sizeof(p(1)), d) /= shared_mem
+    if (condition) then
+      if (shared_mem /= 0) &
+        stop 6
+
+      ! shared_mem is false, but the memory is reading as accessible,
+      ! so let's check that by reading it. We should not do so
+      ! unconditionally because if it's wrong then we'll probably get
+      ! a memory fault.
+      p(1) = 123
+      addr = transfer(c_loc(p), addr)
+
+      !$omp target is_device_ptr(p) map(from:heap_accessible) device(d)
+        if (transfer(c_loc(p), addr) == addr .and. p(1) == 123) &
+          heap_accessible = 1
+      !$omp end target
+
+      if (heap_accessible == 0) &
+        stop 7
+    end if
+
+    stack_accessible = shared_mem
+    condition = omp_target_is_accessible (c_loc(a), 128 * c_sizeof(a(1)), d) /= shared_mem
+    if (condition) then
+      if (shared_mem /= 0) &
+        stop 8
 
-    if (omp_target_is_accessible (c_loc (a), 128 * sizeof (a(1)), d) /= shared_mem) &
-      stop 7;
+      ! shared_mem is false, but the memory is reading as accessible,
+      ! so let's check that by reading it. We should not do so
+      ! unconditionally because if it's wrong then we'll probably get
+      ! a memory fault.
+      test_accessible = 123
+      addr = transfer(c_loc(test_accessible), addr)
 
+      !$omp target has_device_addr(test_accessible) map(from:stack_accessible) device(d)
+        if (transfer(c_loc(test_accessible), addr) == addr &
+            .and. test_accessible == 123) &
+          stack_accessible = 1
+      !$omp end target
+
+      if (stack_accessible == 0) &
+        stop 9
+    end if
+
+    print '(A,I0,A,I0,A,I0,A,I0)', &
+      'device #', d, &
+      ': shared_mem=', shared_mem, &
+      ' heap_accessible=', heap_accessible, &
+      ' stack_accessible=', stack_accessible
+
+    ! omp_target_is_accessible returns false if *any* of the array is
+    ! inaccessible, so we only check the aggregate result.
+    ! (Varying access observed on amdgcn without xnack.)
+    condition = .true.
     do i = 1, 128
-      if (omp_target_is_accessible (c_loc (a(i)), sizeof (a(i)), d) /= shared_mem) &
-        stop 8;
+      if (omp_target_is_accessible (c_loc(a(i)), c_sizeof(a(i)), d) == 0) &
+        condition = .false.
     end do
-
+    if (condition .neqv. stack_accessible /= 0) &
+      stop 10
   end do
 
+  deallocate (p)
+
 end program main
-- 
2.51.0

Re: [PATCH] libgomp, amdgcn, nvptx: Improve omp_target_is_accessible [PR121813]

Reply via email to