rampitec requested changes to this revision.
rampitec added a comment.
This revision now requires changes to proceed.

Needs an IR test, a test for different supported targets, and a negative test 
for unsupported features.



================
Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:199
 
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_global_atomic_fadd_f32, "ff*1fi", "t", 
"gfx90a-insts")
----------------
Correct attribute for this one in atomic-fadd-insts. In particular it was first 
added in gfx908 and you would need to test it too.


================
Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:205
+
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fadd_f64, "dd*1di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_flat_atomic_fmin_f64, "dd*1di", "t", 
"gfx90a-insts")
----------------
Flat address space is 0.


================
Comment at: clang/include/clang/Basic/BuiltinsAMDGPU.def:210
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f64, "dd*3di", "t", 
"gfx90a-insts")
+TARGET_BUILTIN(__builtin_amdgcn_ds_atomic_fadd_f32, "ff*3fi", "t", 
"gfx90a-insts")
+
----------------
This is available since gfx8. Attribute gfx8-insts.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16212
+  case AMDGPU::BI__builtin_amdgcn_flat_atomic_fmax_f64: {
+    Intrinsic::ID IID;
+    llvm::Type *ArgTy = llvm::Type::getDoubleTy(getLLVMContext());
----------------
You do not need any of that code. You can directly map a builtin to intrinsic 
in the IntrinsicsAMDGPU.td.


================
Comment at: clang/test/CodeGenOpenCL/builtins-fp-atomics.cl:112
+kernel void test_flat_global_max(__global double *addr, double x){
+  __builtin_amdgcn_flat_atomic_fmax_f64(addr, x, memory_order_relaxed);
+}
----------------
arsenm wrote:
> gandhi21299 wrote:
> > arsenm wrote:
> > > If you're going to bother testing the ISA, is it worth testing rtn and no 
> > > rtn versions?
> > Sorry, what do you mean by rtn version?
> Most atomics can be optimized if they don't return the in memory value if the 
> value is unused
Certainly yes, because global_atomic_add_f32 did not have return version on 
gfx908.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106909/new/

https://reviews.llvm.org/D106909

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to