This patch tries to fix the 2% regression in 510.parest_r on
ampere1 in the tracker. (Previous discussion is here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624893.html)

1. Add testcases for the problem. For an op list in the form of
"acc = a * b + c * d + acc", currently reassociation doesn't
Swap the operands so that more FMAs can be generated.
After widening_mul the result looks like:

   _1 = .FMA(a, b, acc_0);
   acc_1 = .FMA(c, d, _1);

While previously (before the "Handle FMA friendly..." patch),
widening_mul's result was like:

   _1 = a * b;
   _2 = .FMA (c, d, _1);
   acc_1 = acc_0 + _2;

If the code fragment is in a loop, some architecture can execute
the latter in parallel, so the performance can be much faster than
the former. For the small testcase, the performance gap is over
10% on both ampere1 and neoverse-n1. So the point here is to avoid
turning the last statement into FMA, and keep it a PLUS_EXPR as
much as possible. (If we are rewriting the op list into parallel,
no special treatment is needed, since the last statement after
rewrite_expr_tree_parallel will be PLUS_EXPR anyway.)

2. Function result_feeds_back_from_phi_p is to check for cross
backedge dependency. Added new enum fma_state to describe the
state of FMA candidates.

With this patch, there's a 3% improvement in 510.parest_r 1-copy
run on ampere1. The compile options are: 
"-Ofast -mcpu=ampere1 -flto --param avoid-fma-max-bits=512".

Best regards,
Di Zhao

----

        PR tree-optimization/110279

gcc/ChangeLog:

        * tree-ssa-reassoc.cc (enum fma_state): New enum to
        describe the state of FMA candidates for an op list.
        (rewrite_expr_tree_parallel): Changed boolean 
        parameter to enum type.
        (result_feeds_back_from_phi_p): New function to check
        for cross backedge dependency.
        (rank_ops_for_fma): Return enum fma_state. Added new
        parameter.
        (reassociate_bb): If there's backedge dependency in an
        op list, swap the operands before rewrite_expr_tree.

gcc/testsuite/ChangeLog:

        * gcc.dg/pr110279.c: New test.

Attachment: 0001-swap-operands-in-reassoc-to-reduce-cross-backedge-FM.patch
Description: 0001-swap-operands-in-reassoc-to-reduce-cross-backedge-FM.patch

Reply via email to