================ @@ -0,0 +1,16 @@ +//===----------------------------------------------------------------------===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +// On AMDGPU the "fast" reciprocal is the hardware v_rcp_f32 approximation, +_CLC_OVERLOAD _CLC_DEF __CLC_GENTYPE __clc_recip_fast(__CLC_GENTYPE x) { +#if defined(__CLC_SCALAR) && __CLC_FPSIZE == 32 + return __builtin_amdgcn_rcpf(x); ---------------- jhuber6 wrote:
I think it only does that when denormal flushing is allowed. But the main reason is because without this and the later f64 version in https://github.com/llvm/llvm-project/pull/203809 the performance was worse and the results didn't match. That's mainly what I was testing against. https://github.com/llvm/llvm-project/pull/203805 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
