[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

Artem Belevich via Phabricator via cfe-commits Fri, 02 Jul 2021 16:21:31 -0700

tra added inline comments.


================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.cu:781-786
+  // CHECK_PTX70_SM80: call {{.*}} 
@llvm.nvvm.wmma.m16n16k8.load.c.col.stride.f32
+  // expected-error-re@+1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature 
(sm_80{{.*}},(ptx70{{.*}}}}
+  __mma_tf32_m16n16k8_ld_c(fdst, fsrc, ldm, 1);
+  // CHECK_PTX70_SM80: call {{.*}} 
@llvm.nvvm.wmma.m16n16k8.load.c.row.stride.f32
+  // expected-error-re@+1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature 
(sm_80{{.*}},(ptx70{{.*}}}}
+  __mma_tf32_m16n16k8_ld_c(fdst, fsrc, ldm, 0);
----------------
tra wrote:
> This looks rather odd. We're calling a `tf32` builtin, but expect to see and 
> `f32` load intrinsic. Is that expected ? 
> 
> 
Never mind. I think I understand what's going on now.
CUDA headers use  __mma_tf32 builtins. `A` and `B` operate on opaque integer 
types. `C` and `D` operate on floats.
However, on the PTX front we have `wmma.load.{a,b}...tf32` but 
`wmma.load.c...f32`.

I guess it does make sense to keep LLVM intrinsic names close to the 
instructions they produce.





Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104847/new/

https://reviews.llvm.org/D104847

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

Reply via email to