14 regression] jump threading de-optimizes nested floating point comparisons

cvs-commit at gcc dot gnu.org via Gcc-bugs Fri, 14 Jul 2023 03:23:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154


--- Comment #66 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:d8f5e349772b6652bddb0620bb178290905998b9

commit r14-2516-gd8f5e349772b6652bddb0620bb178290905998b9
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Fri Jul 14 11:21:12 2023 +0100

    ifcvt: Reduce comparisons on conditionals by tracking truths [PR109154]

    Following on from Jakub's patch in
g:de0ee9d14165eebb3d31c84e98260c05c3b33acb
    these two patches finishes the work fixing the regression and improves
codegen.

    As explained in that commit, ifconvert sorts PHI args in increasing number
of
    occurrences in order to reduce the number of comparisons done while
    traversing the tree.

    The remaining task that this patch fixes is dealing with the long chain of
    comparisons that can be created from phi nodes, particularly when they
share
    any common successor (classical example is a diamond node).

    on a PHI-node the true and else branches carry a condition, true will
    carry `a` and false `~a`.  The issue is that at the moment GCC tests both
`a`
    and `~a` when the phi node has more than 2 arguments. Clearly this isn't
    needed.  The deeper the nesting of phi nodes the larger the repetition.

    As an example, for

    foo (int *f, int d, int e)
    {
      for (int i = 0; i < 1024; i++)
        {
          int a = f[i];
          int t;
          if (a < 0)
            t = 1;
          else if (a < e)
            t = 1 - a * d;
          else
            t = 0;
          f[i] = t;
        }
    }

    after Jakub's patch we generate:

      _7 = a_10 < 0;
      _21 = a_10 >= 0;
      _22 = a_10 < e_11(D);
      _23 = _21 & _22;
      _ifc__42 = _23 ? t_13 : 0;
      t_6 = _7 ? 1 : _ifc__42

    but while better than before it is still inefficient, since in the false
    branch, where we know ~_7 is true, we still test _21.

    This leads to superfluous tests for every diamond node.  After this patch
we
    generate

     _7 = a_10 < 0;
     _22 = a_10 < e_11(D);
     _ifc__42 = _22 ? t_13 : 0;
     t_6 = _7 ? 1 : _ifc__42;

    Which correctly elides the test of _21.  This is done by borrowing the
    vectorizer's helper functions to limit predicate mask usages.  Ifcvt will
chain
    conditionals on the false edge (unless specifically inverted) so this patch
on
    creating cond a ? b : c, will register ~a when traversing c.  If c is a
    conditional then c will be simplified to the smaller possible predicate
given
    the assumptions we already know to be true.

    gcc/ChangeLog:

            PR tree-optimization/109154
            * tree-if-conv.cc (gen_simplified_condition,
            gen_phi_nest_statement): New.
            (gen_phi_arg_condition, predicate_scalar_phi): Use it.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/109154
            * gcc.dg/vect/vect-ifcvt-19.c: New test.

[Bug tree-optimization/109154] [13/14 regression] jump threading de-optimizes nested floating point comparisons

Reply via email to