Hi Jiangning, > I see your point, so do you mean we must generate fmla instruction for > intrinsic function vfma_lane_f32(), no matter if it is in -ffast-math mode or > not? Then I think we have to generate fmls for intrinsic function > vfms_lane_f32() as well.
I believe so. > I don't see LLVM IR has @llvm.fms.* defined, so we have to define an aarch64 > specific LLVM intrinsic, or we can use an expression containing llvm.fma.* to > represent it? I think I worked out that it was equivalent to @lllvm.fma(-x, y, z) (and @llvm.fma(x, -y, z)). The negation is exact, and the fusing works out to be the same for "z + (-x)*y" as for "z - x*y". By the way, be wary of the operand order. @llvm.fma(x,y,z) calculates "x*y+z", but "fmla x, y, z" calculates x + y*z. I *think* both me and Ana got that wrong at least once. I know I did. Cheers. Tim. _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
