https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120444
Bug ID: 120444 Summary: [OpenMP][6.0] Add omp_target_memset/omp_target_memset_async Product: gcc Version: 16.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: jakub at gcc dot gnu.org Target Milestone: --- omp_target_memset and omp_target_memset_async should be implemented. This requires on the CUDA side: --- a/include/cuda/cuda.h +++ b/include/cuda/cuda.h @@ -281,0 +282,3 @@ CUresult cuMemcpy3DPeerAsync (const CUDA_MEMCPY3D_PEER *, CUstream); +#define cuMemsetD8 cuMemsetD8_v2 +CUresult cuMemsetD8 (CUdeviceptr, unsigned char, size_t); +CUresult cuMemsetD8Async (CUdeviceptr, unsigned char, size_t, CUstream); https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__MEM_g6e582bf866e9e2fb014297bfaf354d7b.html#g6e582bf866e9e2fb014297bfaf354d7b https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDA__MEM_gaef08a7ccd61112f94e82f2b30d43627.html#gaef08a7ccd61112f94e82f2b30d43627 And on the AMD/HSA side: hsa_status_t hsa_amd_memory_fill(void* ptr, uint32_t value, size_t count); which requires (quoting GCC's include/hsa_ext_amd.h): * @param[in] count Number of uint32_t element to be set to the value. ... * @retval HSA_STATUS_ERROR_INVALID_ARGUMENT @p ptr is NULL or * not 4 bytes aligned i.e. alignment of 4 bytes + multiples of 32bits / 4 bytes, such that a omp_target_memcpy before (for the unaligned bits) and after (for non-multiples of 4 bytes) have to be called.