double bar(double x, double y)
{
double tmp = 0.1234 * y;
return ((x + tmp) * (x - tmp));
}
GCC should use multiply-add and multiply-sub when that is cheaper than
one multiplication and two additions.
With -mfma4 on x86_64 instead of
vmulsd .LC0(%rip), %xmm1, %xmm1
vaddsd %xmm1, %xmm0, %xmm2
vsubsd %xmm1, %xmm0, %xmm0
vmulsd %xmm0, %xmm2, %xmm0
it should generate
vmovsd .LC0(%rip), %xmm3
vfmaddsd %xmm0, %xmm3, %xmm1, %xmm2
vfnmaddsd %xmm0, %xmm3, %xmm1, %xmm0
vmulsd %xmm0, %xmm2, %xmm0
See also PR19988.
FMA opportunities of this kind should probably be detected during RTL
expansion, similar to widening multiplications.
--
Summary: FMAs not exploited
Product: gcc
Version: 4.5.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: rguenth at gcc dot gnu dot org
GCC target triplet: powerpc64-*-*, x86_64-*-*
OtherBugsDependingO 19988
nThis:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42802