This patch tries to fix the 2% regression in 510.parest_r on ampere1 in the tracker. (Previous discussion is here: https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)
1. Add testcases for the problem. For an op list in the form of "acc = a * b + c * d + acc", currently reassociation doesn't Swap the operands so that more FMAs can be generated. After widening_mul the result looks like: _1 = .FMA(a, b, acc_0); acc_1 = .FMA(c, d, _1); While previously (before the "Handle FMA friendly..." patch), widening_mul's result was like: _1 = a * b; _2 = .FMA (c, d, _1); acc_1 = acc_0 + _2; If the code fragment is in a loop, some architecture can execute the latter in parallel, so the performance can be much faster than the former. For the small testcase, the performance gap is over 10% on both ampere1 and neoverse-n1. So the point here is to avoid turning the last statement into FMA, and keep it a PLUS_EXPR as much as possible. (If we are rewriting the op list into parallel, no special treatment is needed, since the last statement after rewrite_expr_tree_parallel will be PLUS_EXPR anyway.) 2. Function result_feeds_back_from_phi_p is to check for cross backedge dependency. Added new enum fma_state to describe the state of FMA candidates. With this patch, there's a 3% improvement in 510.parest_r 1-copy run on ampere1. The compile options are: "-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512". Best regards, Di Zhao ---- PR tree-optimization/110279 gcc/ChangeLog: * tree-ssa-reassoc.cc (enum fma_state): New enum to describe the state of FMA candidates for an op list. (rewrite_expr_tree_parallel): Changed boolean parameter to enum type. (result_feeds_back_from_phi_p): New function to check for cross backedge dependency. (rank_ops_for_fma): Return enum fma_state. Added new parameter. (reassociate_bb): If there's backedge dependency in an op list, swap the operands before rewrite_expr_tree. gcc/testsuite/ChangeLog: * gcc.dg/pr110279.c: New test.
0001-swap-operands-in-reassoc-to-reduce-cross-backedge-FM.patch
Description: 0001-swap-operands-in-reassoc-to-reduce-cross-backedge-FM.patch