Artem-B wrote: > This is pretty mechanical.
It is. I just needed to add more variants, and figured it's time to generalize it all a bit to make my life a bit easier. > Though I do wonder if any of these could be implemented with builtins or some > other clang feature currently. Funny enough, that was exactly the reason we skipped inclusion of CUDA's own headers, so we could implement our own shuffle functions where inline asm just does not give us enough controls to tell compiler what those instructions do. We needed LLVM intrinsics and their properties do do it right. In case of ld/st instructions with cache modifiers, I think that plumbing them through builtins/intrinsics would not buy us much benefit beyond what `asm volatile` or memory clobber already do. Semantics of those cache operations is very specific to the GPUs, and the additional intrinsic-level properties do not help here as much as they did for shuffles. For the compiler it all boils down to "do not eliminate this unused read" and/or "do not reorder memory accesses around it". Inline asm is adequate for that. https://github.com/llvm/llvm-project/pull/190021 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
