If the fma has the exact flag, then we need to use the llvm.fma intrinsic. These come from fma() calls with the precise or invariant qualifiers in GLSL, where you basically have to fuse everything or fuse nothing consistently, and llvm.fmuladd doesn't guarantee that.
On Tue, Oct 3, 2017 at 10:10 PM, Dave Airlie <[email protected]> wrote: > From: Dave Airlie <[email protected]> > > For Vulkan SPIR-V the spec states > fma() Inherited from OpFMul followed by OpFAdd. > > Matt says the backend will do the right thing depending on the > hardware being compiled for, if you use the fmuladd intrinsic. > > Using the Mad Max pts test, on high settings at 4K: > CHP: 55->60 > HGDD: 46->50 > LM: 55->60 > No change on Stronghold. > > Thanks to Feral for spending the time to track this down. > > Signed-off-by: Dave Airlie <[email protected]> > --- > src/amd/common/ac_nir_to_llvm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c > index d7b6259..11ba487 100644 > --- a/src/amd/common/ac_nir_to_llvm.c > +++ b/src/amd/common/ac_nir_to_llvm.c > @@ -1707,7 +1707,7 @@ static void visit_alu(struct ac_nir_context *ctx, const > nir_alu_instr *instr) > result); > break; > case nir_op_ffma: > - result = emit_intrin_3f_param(&ctx->ac, "llvm.fma", > + result = emit_intrin_3f_param(&ctx->ac, "llvm.fmuladd", > ac_to_float_type(&ctx->ac, > def_type), src[0], src[1], src[2]); > break; > case nir_op_ibitfield_extract: > -- > 2.9.4 > > _______________________________________________ > mesa-dev mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
