[libclc] [libclc] Improve performance and precision of reciprocal functions (PR #203805)

Joseph Huber via cfe-commits Mon, 15 Jun 2026 05:27:18 -0700

================
@@ -0,0 +1,16 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// On AMDGPU the "fast" reciprocal is the hardware v_rcp_f32 approximation,
+_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE __clc_recip_fast(__CLC_GENTYPE x) {
+#if defined(__CLC_SCALAR) && __CLC_FPSIZE == 32
+  return __builtin_amdgcn_rcpf(x);
----------------
jhuber6 wrote:


I think it only does that when denormal flushing is allowed. But the main 
reason is because without this and the later f64 version in 
https://github.com/llvm/llvm-project/pull/203809 the performance was worse and 
the results didn't match. That's mainly what I was testing against.

https://github.com/llvm/llvm-project/pull/203805
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[libclc] [libclc] Improve performance and precision of reciprocal functions (PR #203805)

Reply via email to