https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/185889
Summary: This can be made generic, which works as expected on NVPTX and SPIR-V. We do not replace this for AMDGPU because the dedicated built-in has an extra argument that controls whether or not local memory or global memory will be invalidated. It would be correct to use this generic operation there, but we'd lose that minor optimization so we likely should not regress. >From afe8e670b75699ef87a307d72ccf7e9bac28d51d Mon Sep 17 00:00:00 2001 From: Joseph Huber <[email protected]> Date: Wed, 11 Mar 2026 09:18:51 -0500 Subject: [PATCH] [libclc] Add generic clc_mem_fence instruction Summary: This can be made generic, which works as expected on NVPTX and SPIR-V. We do not replace this for AMDGPU because the dedicated built-in has an extra argument that controls whether or not local memory or global memory will be invalidated. It would be correct to use this generic operation there, but we'd lose that minor optimization so we likely should not regress. --- libclc/clc/lib/generic/CMakeLists.txt | 1 + .../lib/{ptx-nvidiacl => generic}/mem_fence/clc_mem_fence.cl | 4 +--- libclc/clc/lib/ptx-nvidiacl/CMakeLists.txt | 1 - 3 files changed, 2 insertions(+), 4 deletions(-) rename libclc/clc/lib/{ptx-nvidiacl => generic}/mem_fence/clc_mem_fence.cl (83%) diff --git a/libclc/clc/lib/generic/CMakeLists.txt b/libclc/clc/lib/generic/CMakeLists.txt index 7d7286de11f85..50b12d3bf4e3d 100644 --- a/libclc/clc/lib/generic/CMakeLists.txt +++ b/libclc/clc/lib/generic/CMakeLists.txt @@ -159,6 +159,7 @@ libclc_configure_source_list(CLC_GENERIC_SOURCES math/clc_tanpi.cl math/clc_tgamma.cl math/clc_trunc.cl + mem_fence/clc_mem_fence.cl misc/clc_shuffle.cl misc/clc_shuffle2.cl relational/clc_all.cl diff --git a/libclc/clc/lib/ptx-nvidiacl/mem_fence/clc_mem_fence.cl b/libclc/clc/lib/generic/mem_fence/clc_mem_fence.cl similarity index 83% rename from libclc/clc/lib/ptx-nvidiacl/mem_fence/clc_mem_fence.cl rename to libclc/clc/lib/generic/mem_fence/clc_mem_fence.cl index fdec76ebc3c57..ded413308e56c 100644 --- a/libclc/clc/lib/ptx-nvidiacl/mem_fence/clc_mem_fence.cl +++ b/libclc/clc/lib/generic/mem_fence/clc_mem_fence.cl @@ -11,8 +11,6 @@ _CLC_OVERLOAD _CLC_DEF void __clc_mem_fence(int memory_scope, int memory_order, __CLC_MemorySemantics memory_semantics) { - (void)memory_order; (void)memory_semantics; - if (memory_scope & (__MEMORY_SCOPE_DEVICE | __MEMORY_SCOPE_WRKGRP)) - __nvvm_membar_cta(); + __scoped_atomic_thread_fence(memory_scope, memory_order); } diff --git a/libclc/clc/lib/ptx-nvidiacl/CMakeLists.txt b/libclc/clc/lib/ptx-nvidiacl/CMakeLists.txt index f345007e852e2..6eb0baab1c0bb 100644 --- a/libclc/clc/lib/ptx-nvidiacl/CMakeLists.txt +++ b/libclc/clc/lib/ptx-nvidiacl/CMakeLists.txt @@ -4,7 +4,6 @@ libclc_configure_source_list(CLC_PTX_NVIDIACL_SOURCES math/clc_rsqrt.cl math/clc_sinpi.cl math/clc_sqrt.cl - mem_fence/clc_mem_fence.cl relational/clc_isinf.cl synchronization/clc_work_group_barrier.cl workitem/clc_get_global_id.cl _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
