[clang] [CUDA] refactor in-header implementation of ld/st with different cache modes. (PR #190021)

Artem Belevich via cfe-commits Wed, 01 Apr 2026 12:08:56 -0700

Artem-B wrote:

> This is pretty mechanical.


It is. I just needed to add more variants, and figured it's time to generalize 
it all a bit to make my life a bit easier.

> Though I do wonder if any of these could be implemented with builtins or some 
> other clang feature currently.

Funny enough, that was exactly the reason we skipped inclusion of CUDA's own 
headers, so we could implement our own shuffle functions where inline asm just 
does not give us enough controls to tell compiler what those instructions do. 
We needed LLVM intrinsics and their properties do do it right.

In case of ld/st instructions with cache modifiers, I think that plumbing them 
through builtins/intrinsics would not buy us much benefit beyond what `asm 
volatile` or memory clobber already do. Semantics of those cache operations is 
very specific to the GPUs, and the additional intrinsic-level properties do not 
help here as much as they did for shuffles. For the compiler it all boils down 
to "do not eliminate this unused read" and/or  "do not reorder memory accesses 
around it". Inline asm is adequate for that.


https://github.com/llvm/llvm-project/pull/190021
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [CUDA] refactor in-header implementation of __ld*/__st* with different cache modes. (PR #190021)

Reply via email to

[clang] [CUDA] refactor in-header implementation of ld/st with different cache modes. (PR #190021)