https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81904
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> --- (In reply to Richard Biener from comment #1) > Hmm, I think the issue is we see > > f (__m128d x, __m128d y, __m128d z) > { > vector(2) double _4; > vector(2) double _6; > > <bb 2> [100.00%]: > _4 = x_2(D) * y_3(D); > _6 = __builtin_ia32_addsubpd (_4, z_5(D)); [tail call] We can fold the builtin into .VEC_ADDSUB, and optimize MUL + VEC_ADDSUB -> VEC_FMADDSUB in match.pd?