https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120444

            Bug ID: 120444
           Summary: [OpenMP][6.0] Add
                    omp_target_memset/omp_target_memset_async
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

omp_target_memset and omp_target_memset_async should be implemented.

This requires on the CUDA side:

--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -281,0 +282,3 @@ CUresult cuMemcpy3DPeerAsync (const CUDA_MEMCPY3D_PEER *,
CUstream);
+#define cuMemsetD8 cuMemsetD8_v2
+CUresult cuMemsetD8 (CUdeviceptr, unsigned char, size_t);
+CUresult cuMemsetD8Async (CUdeviceptr, unsigned char, size_t, CUstream);

https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__MEM_g6e582bf866e9e2fb014297bfaf354d7b.html#g6e582bf866e9e2fb014297bfaf354d7b
https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__MEM_gaef08a7ccd61112f94e82f2b30d43627.html#gaef08a7ccd61112f94e82f2b30d43627


And on the AMD/HSA side:

hsa_status_t hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count);

which requires (quoting GCC's include/hsa_ext_amd.h):

 * @param[in] count Number of uint32_t element to be set to the value.
...
 * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL or
 * not 4 bytes aligned

i.e. alignment of 4 bytes + multiples of 32bits / 4 bytes, such that a
omp_target_memcpy before (for the unaligned bits) and after (for non-multiples
of 4 bytes) have to be called.

Reply via email to