[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Artem Belevich via Phabricator via cfe-commits Wed, 21 Apr 2021 11:37:35 -0700

tra added a comment.

Do you know if any existing code already uses the `__nvvm_*` builtins for 
`cp.async`? In other words, does nvcc provide them already or is it something 
we're free to name as we wish?
I do not see any relevant intrinsics mentioned in NVVM IR spec: 
https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html and I don't think NVCC's 
builtins are publicly documented anywhere.




================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:460-468
+TARGET_BUILTIN(__nvvm_redux_sync_add_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_s32, "SiSii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_add_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_min_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_max_u32, "UiUii", "", SM_80)
+TARGET_BUILTIN(__nvvm_redux_sync_and_b32, "iii", "", SM_80)
----------------
steffenlarsen wrote:
> tra wrote:
> > steffenlarsen wrote:
> > > tra wrote:
> > > > steffenlarsen wrote:
> > > > > tra wrote:
> > > > > > Instead of creating one builtin per integer variant, can we use a 
> > > > > > more generic builtin `__nvvm_redux_sync_add_i`, similar to how we 
> > > > > > handle `__nvvm_atom_add_gen_i` ?
> > > > > > 
> > > > > What gives me pause is that a for atomic minimum there are both 
> > > > > `__nvvm_atom_min_gen_i` and `__nvvm_atom_min_gen_ui` to distinguish 
> > > > > between signed and unsigned. What makes the difference?
> > > > > 
> > > > > That noted, I'll happily rename the builtins to be more in line with 
> > > > > the other builtins. `__nvvm_redux_sync_*_i` and 
> > > > > `__nvvm_redux_sync_*_ui` maybe?
> > > > > What gives me pause is that a for atomic minimum there are both 
> > > > > __nvvm_atom_min_gen_i and __nvvm_atom_min_gen_ui to distinguish 
> > > > > between signed and unsigned. What makes the difference?
> > > > 
> > > > Good point. We do not need unsigned variant for `add`.  We do need 
> > > > explicit signed and unsigned variants ad LLVM IR integer types do not 
> > > > take signedness into account, and the underlying min/max instructions 
> > > > do.  Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics? 
> > > > 
> > > > We may skip the `_i` suffix on logical ops as they only apply to 
> > > > integers anyways.
> > > > 
> > > Sorry, I completely missed your responses.
> > > 
> > > > Maybe, rename min_i/min_ui -> min/umin as LLVM does with atomics?
> > > 
> > > Sounds good to me. Would there also be umax and uadd?
> > > 
> > > > We may skip the _i suffix on logical ops as they only apply to integers 
> > > > anyways.
> > > 
> > > Absolutely. I'll make that happen! 
> > > Would there also be umax and uadd?
> > 
> > You will need `umax`, but there's no need for `uadd` as 2-complement 
> > addition is the same for signed/unsigned.
> > 
> > E.g `umax(0xffffffff, 1) -> 0xffffffff`, `max(-1,1) -> 1`, give different 
> > answers, but `uadd(0xffffffff, 1) -> 0` and `add(-1,1) -> 0`.
> Ah, of course. Though I do wonder as to the motivation of having signed and 
> unsigned add variants in PTX. I'll drop the unsigned variant.
It's for uniformity sake, I guess. All arithmetic ops in PTX operate on sXX/uXX 
arguments, though not all of them have to.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100124/new/

https://reviews.llvm.org/D100124

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D100124: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions

Reply via email to