On 6/11/20 6:01 AM, Zhanghaijian (A) wrote:
> Hi,
>
> This is a experimental fix for pr94274.
> For if/else structure, when the expressions is binary operation and have a 
> common subexpression, and the opcode is the same, we can
> fold the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can be 
> optimized to do csel first and then do binary operations.
> This can eliminate one or more instructions. And this patch is effective for 
> 500.perlbench_r in spec2017.
> Bootstrap and tested on aarch64/x86_64 Linux platform. No new regression 
> witnessed.
>
> Any suggestion?  
>
> Thanks,
> Haijian Zhang
>
> pr94274-v1.diff
>
> From 1f0f09ef3170569ef15791cf3e70de781a9a4cb0 Mon Sep 17 00:00:00 2001
> From: Haijian Zhang <z.zhanghaij...@huawei.com>
> Date: Thu, 11 Jun 2020 14:56:44 +0800
> Subject: [PATCH] fold phi whose incoming args are defined from binary
>  operations [PR94274]
>
> For if/else structure, when the expressions is binary operation and
> have a common subexpression, and the opcode is the same. We can fold
> the merging phi node in tree_ssa_phiopt_worker (ssa-phiopt). It can
> be optimized to do csel first and then do binary operations. This can
> eliminate one or more instructions.
>
> 2020-06-11  Haijian Zhang  <z.zhanghaij...@huawei.com>
>
> gcc/
>
>       PR tree-optimization/94274
>       * common.opt (ftree-combine): New option.
>       * doc/invoke.texi (-ftree-combine): Document new option.
>       * opts.c (default_options_table): Enable -ftree-combine at -O1.
>       * passes.def: Add pass_adjust_alignment.
>       * tree-pass.h (make_pass_tree_combine): New.
>       * tree-ssa-phiopt.c (pass_data_tree_combine): New.
> gcc/testsuite/
>       PR tree-optimization/94274
>       * gcc.dg/tree-ssa/pr94274-1.c: New test.
>       * gcc.dg/tree-ssa/pr94274-2.c: Likewise.
>       * gcc.dg/tree-ssa/pr94274-3.c: Likewise.
I like a lot of what I see in here.   Do you have a copyright assignment
and employer disclaimer on file with the FSF?  If not, then we can't
take the submission at this time.


> ---
>  gcc/ChangeLog                             |  10 +
>  gcc/common.opt                            |   4 +
>  gcc/doc/invoke.texi                       |  15 +-
>  gcc/opts.c                                |   1 +
>  gcc/passes.def                            |   1 +
>  gcc/testsuite/ChangeLog                   |   7 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c |  15 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c |  26 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c |  16 +
>  gcc/tree-pass.h                           |   1 +
>  gcc/tree-ssa-phiopt.c                     | 352 +++++++++++++++++++++-
>  11 files changed, 440 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr94274-3.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index b0860738d04..c1a5bf87626 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,13 @@
> +2020-06-11  Haijian Zhang  <z.zhanghaij...@huawei.com>
> +
> +     PR tree-optimization/94274
> +     * common.opt (ftree-combine): New option.
> +     * doc/invoke.texi (-ftree-combine): Document new option.
> +     * opts.c (default_options_table): Enable -ftree-combine at -O1.
> +     * passes.def: Add pass_adjust_alignment.
> +     * tree-pass.h (make_pass_tree_combine): New.
> +     * tree-ssa-phiopt.c (pass_data_tree_combine): New.
You should reference tree-optimization/64700 as well.   While your patch
doesn't address all the issues in 64700, it's an excellent start.

> +
>  2020-06-10  Martin Sebor  <mse...@redhat.com>
>  
>       PR middle-end/95353
> diff --git a/gcc/common.opt b/gcc/common.opt
> index df8af365d1b..f4441aa9ddd 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2721,6 +2721,10 @@ ftree-cselim
>  Common Report Var(flag_tree_cselim) Init(2) Optimization
>  Transform condition stores into unconditional ones.
>  
> +ftree-combine
> +Common Report Var(flag_tree_combine) Optimization
> +Combine binary operations using SSA PHI nodes.
I probably would bother with a switch for this.  There's already a flag
to control phi-opt as a whole and that seems fine for this change as well.

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
> new file mode 100644
> index 00000000000..f883c6a3201
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94274-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-combine" } */
> +
> +int foo (int cond, int a, int b, int c)
> +{
> +  int result = 0;
> +
> +  if (cond == 1)
> +    result = b + a;
> +  else
> +    result = a + c;
> +  return result;
> +}
You should look at comment #3 in pr64700 which I think has a couple more
test cases.diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c


> index 5f283890998..9b09330c026 100644
> --- a/gcc/tree-ssa-phiopt.c
> +++ b/gcc/tree-ssa-phiopt.c
> @@ -48,7 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-eh.h"
>  #include "gimple-fold.h"
>  
> -static unsigned int tree_ssa_phiopt_worker (bool, bool, bool);
> +static unsigned int tree_ssa_phiopt_worker (bool, bool, bool, bool);
>  static bool two_value_replacement (basic_block, basic_block, edge, gphi *,
>                                  tree, tree);
So if we don't make this a separate pass, but instead have it run as a
standard part of the phi-opt optimizations, then I don't think you need
an argument to control the behavior.

> @@ -155,14 +155,262 @@ single_non_singleton_phi_for_edges (gimple_seq seq, 
> edge e0, edge e1)
>    return phi;
>  }
>  
> +/* Compare the operands of stmts.  Return 1 if only rhs1 is different.
> +   Return 2 if only rhs2 is different.  Otherwise return false.  */
> +
> +static unsigned
> +compare_operands (gimple *first_stmt, tree rhs1, tree rhs2)
So the comment here could be better.  "If only rhs1 is different." --
different from what?, similarly for the comment about rhs2.  I'd use "0"
instead of false -- we generally use true/false for booleans, mixing
them could confuse folks.  You could make an enum for the 3 return
states and use that instead.
> +/* Check all stmts for phis, return true if the stmts are binary operation,
> +   and have the same opcode.  Otherwise return false.  */
> +
> +static bool
> +check_stmt_for_phi (gimple *phi)
> +{
> +  gimple *first_stmt;
> +  enum tree_code opcode;
> +  tree arg;
> +
> +  /* First clear flag for stmts which used to track of which operand
> +     need phi node.  */
> +  gimple_set_plf (phi, GF_PLF_1, false);
> +  gimple_set_plf (phi, GF_PLF_2, false);
> +
> +  if (gimple_code (phi) != GIMPLE_PHI)
> +    return false;
So this function is called within a loop over the PHIs.  THere should be
no case when gimple_code (phi) != GIMPLE_PHI.  I'd just drop that test.

If you want checking, then change the argument to a gphi* and you'll get
static checking at the call site.


+ /* Check the stmt is GIMPLE_ASSIGN, and the stmt is binary operation. */
> +  if (!first_stmt || !is_gimple_assign (first_stmt)
> +      || gimple_assign_rhs_class (first_stmt) != GIMPLE_BINARY_RHS
> +      || !has_single_use (gimple_assign_lhs (first_stmt)))
> +    return false;
So I don't think it should be a requirement for this patch to move
forward, but you could easily handle unary operations here too.  I'm
thinking about stuff like casts and the like.  But it's also a step
towards handling stuff like INDIRECT_REF.

You may want to verify that the defining statements do not have any
virtual operands.   It's just the safe thing to do.


> +
> +/* Check all operands of stmts, keep track of which operand needs a phi.
> +   Return true if the phis can be replaced, otherwise return false.  */
> +
> +static bool
> +check_operands_for_stmt (gimple *phi)
> +{
> +  tree arg = PHI_ARG_DEF (phi, 0);
> +  gimple *first_stmt = SSA_NAME_DEF_STMT (arg);
> +
> +  enum tree_code opcode = gimple_assign_rhs_code (first_stmt);
> +  tree rhs1 = gimple_assign_rhs1 (first_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (first_stmt);
> +
> +  /* Scan all args to find which can do the replacement.  */
> +  for (unsigned i = 1; i != gimple_phi_num_args (phi); i++)
> +    {
> +      gimple *stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, i));
> +      unsigned rhschanged = 0;
> +
> +      rhschanged = compare_operands (first_stmt,
> +                                  gimple_assign_rhs1 (stmt),
> +                                  gimple_assign_rhs2 (stmt));
So you earlier verified that FIRST_STMT is a GIMPLE_BINARY_RHS.  But you
don't do it for the defining statements of subsequent operands in the
PHI node.   You need to verify the basic form STMT to ensure it's the
same as FIRST_STMT before you call gimple_assign_rhs1 (stmt) and
gimple_assign_rhs2 (stmt).

+
> +/* The function binary_rhs_replacement does the main work of doing the
> +   replacement of binary operation.  */
> +
> +static void
> +binary_rhs_replacement (gimple *phi)
> +{
> +  edge e;
> +  gphi *newphi = NULL;
> +  gimple *first_stmt = SSA_NAME_DEF_STMT (PHI_ARG_DEF (phi, 0));
> +  tree rhs1 = gimple_assign_rhs1 (first_stmt);
> +  tree rhs2 = gimple_assign_rhs2 (first_stmt);
> +
> +  /*  Create a new PHI for replacement.  */
> +  if (gimple_plf (phi, GF_PLF_1))
> +    {
> +      tree temp = make_ssa_name (TREE_TYPE (rhs1), NULL);
> +      newphi = create_phi_node (temp, gimple_bb (phi));
> +      e = gimple_phi_arg_edge (newphi, 0);
> +      add_phi_arg (newphi, gimple_assign_rhs1 (first_stmt), e,
> +                UNKNOWN_LOCATION);
> +    }
> +  else if (gimple_plf (phi, GF_PLF_2))
> +    {
> +      tree temp = make_ssa_name (TREE_TYPE (rhs2), NULL);
> +      newphi = create_phi_node (temp, gimple_bb (phi));
> +      e = gimple_phi_arg_edge (newphi, 0);
> +      add_phi_arg (newphi, gimple_assign_rhs2 (first_stmt), e,
> +                UNKNOWN_LOCATION);
> +    }
I would think we could do better than UNKNOWN_LOCATION here.  We should
have locations in the old phi as well as the statements in bb1/bb2.  Any
would be better than UNKNOWN_LOCATION here.  Similarly for the rest of
the PHI arguments.

At a high level, does your patch handle the case where one of the PHI
arguments is defined by another PHI in the same block?  This can happen
in loops and the semantics are that all the reads of PHI arguments
happen at once and assignments to all the PHI results happen after that,
and effectively at the same time.  If this scenario is detected, I'd
just avoid trying to do the optimization.

And a nit.  Please check for consistent use of tabs and spaces. 
Formatting guidelines for GCC are to use tabs instead of 8 spaces.  I
think the new code you're adding is inconsistent.  As an example look at
the 3 lines that start with "/* Remove the first statement.  */ in
binary_rhs_replacement.


So I think this needs another iteration, but given it was submitted in
June, I think it should still be considered for gcc-11, particularly if
the copyright assignment and technical details can be worked out
relatively quickly.

Thanks!

jeff

Reply via email to