Re: [PATCH V2] introduce light expander sra

Richard Biener Thu, 23 Nov 2023 06:16:14 -0800

On Fri, Oct 27, 2023 at 3:51 AM Jiufu Guo <guoji...@linux.ibm.com> wrote:
>
> Hi,
>
> Compare with previous version:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632399.html
> This verion supports TI/VEC mode of the access.
>
> There are a few PRs (meta-bug PR101926) on various targets.
> The root causes of them are similar: the aggeragte param/
> returns are passed by multi-registers, but they are stored
> to stack from registers first; and then, access the
> parameter through stack slot.
>
> A general idea to enhance this: accessing the aggregate
> parameters/returns directly through incoming/outgoing
> scalar registers.  This idea would be a kind of SRA.
>
> This experimental patch for light-expander-sra contains
> below parts:
>
> a. Check if the parameters/returns are ok/profitable to
>    scalarize, and set the incoming/outgoing registers(
>    pseudos) for the parameter/return.
>   - This is done in "expand_function_start", after the
>     incoming/outgoing hard registers are determined for the
>     paramter/return.
>     The scalarized registers are recorded in DECL_RTL for
>     the parameter/return in parallel form.
>   - At the time when setting DECL_RTL, "scalarizable_aggregate"
>     is called to check the accesses are ok/profitable to
>     scalarize.
>     We can continue to enhance this function, to support
>     more cases.  For example:
>     - 'reverse storage order'.
>     - 'writing to parameter'/'overlap accesses'.
>
> b. When expanding the accesses of the parameters/returns,
>    according to the info of the access(e.g. bitpos,bitsize,
>    mode), the scalar(pseudos) can be figured out to expand
>    the access.  This may happen when expand below accesses:
>   - The component access of a parameter: "_1 = arg.f1".
>     Or whole parameter access: rhs of "_2 = arg"
>   - The assignment to a return val:
>     "D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
>     stmt.
>   - This is mainly done in expr.cc(expand_expr_real_1, and
>     expand_assignment).  Function "extract_sub_member" is
>     used to figure out the scalar rtxs(pseudos).
>
> Besides the above two parts, some work are done in the GIMPLE
> tree:  collect sra candidates for parameters/returns, and
> collect the SRA access info.
> This is mainly done at the beginning of the expander pass.
> Below are two major items of this part.
>  - Collect light-expand-sra candidates.
>   Each parameter is checked if it has the proper aggregate
>   type.  Collect return val (VAR_P) on each return stmts if
>   the function is returning via registers.
>   This is implemented in expand_sra::collect_sra_candidates.
>
>  - Build/collect/manage all the access on the candidates.
>   The function "scan_function" is used to do this work, it
>   goes through all basicblocks, and all interesting stmts (
>   phi, return, assign, call, asm) are checked.
>   If there is an interesting expression (e.g. COMPONENT_REF
>   or PARM_DECL), then record the required info for the access
>   (e.g. pos, size, type, base).
>   And if it is risky to do SRA, the candidates may be removed.
>   e.g. address-taken and accessed via memory.
>   "foo(struct S arg) {bar (&arg);}"
>
> This patch is tested on ppc64{,le} and x86_64.
> Is this ok for trunk?
>
> BR,
> Jeff (Jiufu Guo)
>
>         PR target/65421
>
> gcc/ChangeLog:
>
>         * cfgexpand.cc (struct access): New class.
>         (struct expand_sra): New class.
>         (expand_sra::collect_sra_candidates): New member function.
>         (expand_sra::add_sra_candidate): Likewise.
>         (expand_sra::build_access): Likewise.
>         (expand_sra::analyze_phi): Likewise.
>         (expand_sra::analyze_assign): Likewise.
>         (expand_sra::visit_base): Likewise.
>         (expand_sra::protect_mem_access_in_stmt): Likewise.
>         (expand_sra::expand_sra):  Class constructor.
>         (expand_sra::~expand_sra): Class destructor.
>         (expand_sra::scalarizable_access):  New member function.
>         (expand_sra::scalarizable_accesses):  Likewise.
>         (scalarizable_aggregate):  New function.
>         (set_scalar_rtx_for_returns):  New function.
>         (expand_value_return): Updated.
>         (expand_debug_expr): Updated.
>         (pass_expand::execute): Updated to use expand_sra.
>         * cfgexpand.h (scalarizable_aggregate): New declare.
>         (set_scalar_rtx_for_returns): New declare.
>         * expr.cc (expand_assignment): Updated.
>         (expand_constructor): Updated.
>         (query_position_in_parallel): New function.
>         (extract_sub_member): New function.
>         (expand_expr_real_1): Updated.
>         * expr.h (query_position_in_parallel): New declare.
>         * function.cc (assign_parm_setup_block): Updated.
>         (assign_parms): Updated.
>         (expand_function_start): Updated.
>         * tree-sra.h (struct sra_base_access): New class.
>         (struct sra_default_analyzer): New class.
>         (scan_function): New template function.
>         * var-tracking.cc (track_loc_p): Updated.
>
> gcc/testsuite/ChangeLog:
>
>         * g++.target/powerpc/pr102024.C: Updated
>         * gcc.target/powerpc/pr108073.c: New test.
>         * gcc.target/powerpc/pr65421-1.c: New test.
>         * gcc.target/powerpc/pr65421-2.c: New test.
>
> ---
>  gcc/cfgexpand.cc                             | 352 ++++++++++++++++++-
>  gcc/cfgexpand.h                              |   2 +
>  gcc/expr.cc                                  | 179 +++++++++-
>  gcc/expr.h                                   |   3 +
>  gcc/function.cc                              |  36 +-
>  gcc/tree-sra.h                               |  76 ++++
>  gcc/var-tracking.cc                          |   3 +-
>  gcc/testsuite/g++.target/powerpc/pr102024.C  |   2 +-
>  gcc/testsuite/gcc.target/i386/pr20020-2.c    |   5 +
>  gcc/testsuite/gcc.target/powerpc/pr108073.c  |  29 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c |   6 +
>  gcc/testsuite/gcc.target/powerpc/pr65421-2.c |  32 ++
>  12 files changed, 718 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108073.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-2.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 4262703a138..ef99ca8ac13 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -74,6 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "output.h"
>  #include "builtins.h"
>  #include "opts.h"
> +#include "tree-sra.h"
>
>  /* Some systems use __main in a way incompatible with its use in gcc, in 
> these
>     cases use the macros NAME__MAIN to give a quoted symbol and SYMBOL__MAIN 
> to
> @@ -97,6 +98,343 @@ static bool defer_stack_allocation (tree, bool);
>
>  static void record_alignment_for_reg_var (unsigned int);
>
> +/* For light SRA in expander about paramaters and returns.  */
> +struct access : public sra_base_access
> +{
> +};
> +
> +typedef struct access *access_p;
> +
> +struct expand_sra : public sra_default_analyzer
> +{
> +  /* Construct/destruct resources, e.g. sra candidates.  */
> +  expand_sra ();
> +  ~expand_sra ();
> +
> +  /* No actions for pre_analyze_stmt, analyze_return.  */
> +
> +  /* Overwrite phi,call,asm analyzations.  */
> +  void analyze_phi (gphi *s);
> +
> +  /* TODO: Check accesses on call/asm.  */
> +  void analyze_call (gcall *s) { protect_mem_access_in_stmt (s); };
> +  void analyze_asm (gasm *s) { protect_mem_access_in_stmt (s); };
> +
> +  /* Check access of SRA on assignment.  */
> +  void analyze_assign (gassign *);
> +
> +  /* Check if the accesses of BASE(parameter or return) are
> +     scalarizable, according to the incoming/outgoing REGS. */
> +  bool scalarizable_accesses (tree base, rtx regs);
> +
> +private:
> +  /* Collect the parameter and returns to check if they are suitable for
> +     scalarization.  */
> +  bool collect_sra_candidates (void);
> +
> +  /* Return true if VAR is added as a candidate for SRA.  */
> +  bool add_sra_candidate (tree var);
> +
> +  /* Return true if EXPR has interesting sra access, and created access,
> +     return false otherwise.  */
> +  access_p build_access (tree expr, bool write);
> +
> +  /* Check if the access ACC is scalarizable.  REGS is the incoming/outgoing
> +     registers which the access is based on. */
> +  bool scalarizable_access (access_p acc, rtx regs, bool is_parm);
> +
> +  /* If there is risk (stored/loaded or addr taken),
> +     disqualify the sra candidates in the un-interesting STMT. */
> +  void protect_mem_access_in_stmt (gimple *stmt);
> +
> +  /* Callback of walk_stmt_load_store_addr_ops, used to remove
> +     unscalarizable accesses.  */
> +  static bool visit_base (gimple *, tree op, tree, void *data);
> +
> +  /* Base (tree) -> Vector (vec<access_p> *) map.  */
> +  hash_map<tree, auto_vec<access_p> > *base_access_vec;
> +};
> +
> +bool
> +expand_sra::collect_sra_candidates (void)
> +{
> +  bool ret = false;
> +
> +  /* Collect parameters.  */
> +  for (tree parm = DECL_ARGUMENTS (current_function_decl); parm;
> +       parm = DECL_CHAIN (parm))
> +    ret |= add_sra_candidate (parm);
> +
> +  /* Collect VARs on returns.  */
> +  if (DECL_RESULT (current_function_decl))
> +    {
> +      edge_iterator ei;
> +      edge e;
> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +       if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> +         {
> +           tree val = gimple_return_retval (r);
> +           /* To sclaraized the return, the return value should be only
> +              writen (except this return stmt).  Using 'true(write)' to
> +              pretend the access only be 'writen'. */
> +           if (val && VAR_P (val))
> +             ret |= add_sra_candidate (val) && build_access (val, true);
> +         }
> +    }
> +
> +  return ret;
> +}
> +
> +bool
> +expand_sra::add_sra_candidate (tree var)
> +{
> +  tree type = TREE_TYPE (var);
> +
> +  if (!AGGREGATE_TYPE_P (type) || !tree_fits_shwi_p (TYPE_SIZE (type))
> +      || tree_to_shwi (TYPE_SIZE (type)) == 0 || TREE_THIS_VOLATILE (var)
> +      || is_va_list_type (type))
> +    return false;
> +
> +  base_access_vec->get_or_insert (var);
> +
> +  return true;
> +}
> +
> +access_p
> +expand_sra::build_access (tree expr, bool write)
> +{
> +  enum tree_code code = TREE_CODE (expr);
> +  if (code != VAR_DECL && code != PARM_DECL && code != COMPONENT_REF
> +      && code != ARRAY_REF && code != ARRAY_RANGE_REF)
> +    return NULL;
> +
> +  HOST_WIDE_INT offset, size;
> +  bool reverse;
> +  tree base = get_ref_base_and_extent_hwi (expr, &offset, &size, &reverse);
> +  if (!base || !DECL_P (base))
> +    return NULL;
> +
> +  vec<access_p> *access_vec = base_access_vec->get (base);
> +  if (!access_vec)
> +    return NULL;
> +
> +  /* TODO: support reverse. */
> +  if (reverse || size <= 0 || offset + size > tree_to_shwi (DECL_SIZE 
> (base)))
> +    {
> +      base_access_vec->remove (base);
> +      return NULL;
> +    }
> +
> +  struct access *access = XNEWVEC (struct access, 1);
> +
> +  memset (access, 0, sizeof (struct access));
> +  access->offset = offset;
> +  access->size = size;
> +  access->expr = expr;
> +  access->write = write;
> +  access->reverse = reverse;
> +
> +  access_vec->safe_push (access);
> +
> +  return access;
> +}


I still think there's SRA code we can share for producing
accesses from a stmt.

> +
> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> +   there is addr-taken on the candidate in the STMT.  */
> +
> +void
> +expand_sra::analyze_phi (gphi *stmt)
> +{
> +  if (base_access_vec && !base_access_vec->is_empty ())
> +    walk_stmt_load_store_addr_ops (stmt, this, NULL, NULL, visit_base);
> +}
> +
> +void
> +expand_sra::analyze_assign (gassign *stmt)
> +{
> +  if (!base_access_vec || base_access_vec->is_empty ())
> +    return;
> +
> +  if (gimple_assign_single_p (stmt) && !gimple_clobber_p (stmt))
> +    {
> +      tree rhs = gimple_assign_rhs1 (stmt);
> +      tree lhs = gimple_assign_lhs (stmt);
> +      bool res_r = build_access (rhs, false);
> +      bool res_l = build_access (lhs, true);
> +
> +      if (res_l || res_r)
> +       return;
> +    }
> +
> +  protect_mem_access_in_stmt (stmt);
> +}
> +
> +/* Callback of walk_stmt_load_store_addr_ops, used to remove
> +   unscalarizable accesses.  Called by protect_mem_access_in_stmt.  */
> +
> +bool
> +expand_sra::visit_base (gimple *, tree op, tree, void *data)
> +{
> +  op = get_base_address (op);
> +  if (op && DECL_P (op))
> +    {
> +      expand_sra *p = (expand_sra *) data;
> +      p->base_access_vec->remove (op);
> +    }
> +  return false;
> +}
> +
> +/* Function protect_mem_access_in_stmt removes the SRA candidates if
> +   there is store/load/addr-taken on the candidate in the STMT.
> +
> +   For some statements, which SRA does not care about, if there are
> +   possible memory operation on the SRA candidates, it would be risky
> +   to scalarize it.  */
> +
> +void
> +expand_sra::protect_mem_access_in_stmt (gimple *stmt)
> +{
> +  if (base_access_vec && !base_access_vec->is_empty ())
> +    walk_stmt_load_store_addr_ops (stmt, this, visit_base, visit_base,
> +                                  visit_base);
> +}
> +
> +expand_sra::expand_sra () : base_access_vec (NULL)
> +{
> +  if (optimize <= 0)
> +    return;
> +
> +  base_access_vec = new hash_map<tree, auto_vec<access_p> >;
> +  collect_sra_candidates ();
> +}
> +
> +expand_sra::~expand_sra ()
> +{
> +  if (optimize <= 0)
> +    return;
> +
> +  delete base_access_vec;
> +}
> +
> +bool
> +expand_sra::scalarizable_access (access_p acc, rtx regs, bool is_parm)
> +{
> +  /* Now only support reading from parms
> +     or writing to returns.  */
> +  if (is_parm && acc->write)
> +    return false;
> +  if (!is_parm && !acc->write)
> +    return false;
> +
> +  /* Compute the position of the access in the parallel regs.  */
> +  int start_index = -1;
> +  int end_index = -1;
> +  HOST_WIDE_INT left_bits = 0;
> +  HOST_WIDE_INT right_bits = 0;
> +  query_position_in_parallel (acc->offset, acc->size, regs, start_index,
> +                             end_index, left_bits, right_bits);
> +
> +  /* Invalid access possition: padding or outof bound.  */
> +  if (start_index < 0 || end_index < 0)
> +    return false;
> +
> +  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (acc->expr));
> +  /* Need multi-registers in a parallel for the access.  */
> +  if (expr_mode == BLKmode || end_index > start_index)
> +    {
> +      if (left_bits || right_bits)
> +       return false;
> +      if (expr_mode == BLKmode)
> +       return true;
> +
> +      /* For large modes, only support TI/VECTOR in mult-registers. */
> +      if (known_gt (acc->size, GET_MODE_BITSIZE (word_mode)))
> +       return expr_mode == TImode || VECTOR_MODE_P (expr_mode);
> +      return true;
> +    }
> +
> +  gcc_assert (end_index == start_index);
> +
> +  /* Just need one reg for the access.  */
> +  if (left_bits == 0 && right_bits == 0)
> +    return true;
> +
> +  scalar_int_mode imode;
> +  /* Need to extract bits from the reg for the access.  */
> +  return !acc->write && int_mode_for_mode (expr_mode).exists (&imode);
> +}
> +
> +/* Now, the base (parm/return) is scalarizable, only if all
> +   accesses of the BASE are scalariable.
> +
> +   This function need to be updated, to support more complicate
> +   cases, like:
> +   - Some access are scalarizable, but some are not.
> +   - Access is writing to a parameter.
> +   - Writing accesses are overlap with multi-accesses.   */
> +
> +bool
> +expand_sra::scalarizable_accesses (tree base, rtx regs)
> +{
> +  if (!base_access_vec)
> +    return false;
> +  vec<access_p> *access_vec = base_access_vec->get (base);
> +  if (!access_vec)
> +    return false;
> +  if (access_vec->is_empty ())
> +    return false;
> +
> +  bool is_parm = TREE_CODE (base) == PARM_DECL;
> +  int n = access_vec->length ();
> +  int cur_access_index = 0;
> +  for (; cur_access_index < n; cur_access_index++)
> +    if (!scalarizable_access ((*access_vec)[cur_access_index], regs, 
> is_parm))
> +      break;
> +
> +  /* It is ok if all access are scalarizable.  */
> +  if (cur_access_index == n)
> +    return true;
> +
> +  base_access_vec->remove (base);
> +  return false;
> +}
> +
> +static expand_sra *current_sra = NULL;
> +
> +/* Check if the PARAM (or return) is scalarizable.
> +
> +   This interface is used in expand_function_start
> +   to check sra possiblity for parmeters. */
> +
> +bool
> +scalarizable_aggregate (tree parm, rtx regs)
> +{
> +  if (!current_sra)
> +    return false;
> +  return current_sra->scalarizable_accesses (parm, regs);
> +}
> +
> +/* Check if interesting returns, and if they are scalarizable,
> +   set DECL_RTL as scalar registers.
> +
> +   This interface is used in expand_function_start
> +   when outgoing registers are determinded for DECL_RESULT.  */
> +
> +void
> +set_scalar_rtx_for_returns ()
> +{
> +  rtx res = DECL_RTL (DECL_RESULT (current_function_decl));
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> +    if (greturn *r = safe_dyn_cast<greturn *> (*gsi_last_bb (e->src)))
> +      {
> +       tree val = gimple_return_retval (r);
> +       if (val && VAR_P (val) && scalarizable_aggregate (val, res))
> +         SET_DECL_RTL (val, res);
> +      }
> +}
> +
>  /* Return an expression tree corresponding to the RHS of GIMPLE
>     statement STMT.  */
>
> @@ -3707,7 +4045,7 @@ expand_value_return (rtx val)
>
>    tree decl = DECL_RESULT (current_function_decl);
>    rtx return_reg = DECL_RTL (decl);
> -  if (return_reg != val)
> +  if (!rtx_equal_p (return_reg, val))
>      {
>        tree funtype = TREE_TYPE (current_function_decl);
>        tree type = TREE_TYPE (decl);
> @@ -4423,6 +4761,12 @@ expand_debug_expr (tree exp)
>    addr_space_t as;
>    scalar_int_mode op0_mode, op1_mode, addr_mode;
>
> +  /* TODO: Enable to debug expand-sra optimized parm/returns.  */
> +  tree base = get_base_address (exp);
> +  if ((TREE_CODE (base) == PARM_DECL || (VAR_P (base) && DECL_RTL_SET_P 
> (base)))
> +      && GET_CODE (DECL_RTL (base)) == PARALLEL)
> +    return NULL_RTX;
> +
>    switch (TREE_CODE_CLASS (TREE_CODE (exp)))
>      {
>      case tcc_expression:
> @@ -6628,6 +6972,10 @@ pass_expand::execute (function *fun)
>    auto_bitmap forced_stack_vars;
>    discover_nonconstant_array_refs (forced_stack_vars);
>
> +  /* Enable light-expander-sra.  */
> +  current_sra = new expand_sra;
> +  scan_function (cfun, *current_sra);
> +
>    /* Make sure all values used by the optimization passes have sane
>       defaults.  */
>    reg_renumber = 0;
> @@ -7056,6 +7404,8 @@ pass_expand::execute (function *fun)
>        loop_optimizer_finalize ();
>      }
>
> +  delete current_sra;
> +  current_sra = NULL;
>    timevar_pop (TV_POST_EXPAND);
>
>    return 0;
> diff --git a/gcc/cfgexpand.h b/gcc/cfgexpand.h
> index 0e551f6cfd3..3415c217708 100644
> --- a/gcc/cfgexpand.h
> +++ b/gcc/cfgexpand.h
> @@ -24,5 +24,7 @@ extern tree gimple_assign_rhs_to_tree (gimple *);
>  extern HOST_WIDE_INT estimated_stack_frame_size (struct cgraph_node *);
>  extern void set_parm_rtl (tree, rtx);
>
> +extern bool scalarizable_aggregate (tree, rtx);
> +extern void set_scalar_rtx_for_returns ();
>
>  #endif /* GCC_CFGEXPAND_H */
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 763bd82c59f..5ba26e0ef52 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -5618,7 +5618,10 @@ expand_assignment (tree to, tree from, bool 
> nontemporal)
>       Assignment of an array element at a constant index, and assignment of
>       an array element in an unaligned packed structure field, has the same
>       problem.  Same for (partially) storing into a non-memory object.  */
> -  if (handled_component_p (to)
> +  if ((handled_component_p (to)
> +       && !(VAR_P (get_base_address (to))

get_base_address is pretty expensive so please cache the result.

> +           && DECL_RTL_SET_P (get_base_address (to))
> +           && GET_CODE (DECL_RTL (get_base_address (to))) == PARALLEL))

I think that's possibly fragile in case a PARALLEL can also appear by
other means here.  I also think it's the wrong thing to map a structural
reference to a separate RTL expression, instead we should rewrite the
GIMPLE accordingly with artificial decls which would also be the way
to preserve debug info.

In principle "expander SRA" isn't different from regular SRA apart from
the fact that it handles parameter and return variables and heuristically
includes the ABI (does it?) in decomposition.

So the only important difference is that we cannot represent the
change for the incoming arguments or the outgoing regs in GIMPLE
but we should only have to special case _those_ bits, not all the
other accesses we can rewrite.

We could even use the actual SRA pass in a special mode right
before RTL expansion (or close to it) and invent a special (but ugly)
representation for the incoming/outgoing part, say by using
new internal functions present between the SRA pass and RTL
expansion:

struct X foo (struct X x)
{
  int D.1234 = .ARG_PART (x, offset, size);
  int D.1235 = .ARG_PART (x, offset, size);
  D.2345 = .SET_ARG_PARTS (D.1234, offset, size, D.1235, offset, size);
  return D.2345;
}

I'm sure that's not an ideal representation though but my point about
modifying the actual accesses stands.  Finding a working and not
too ugly way to represent the data flow between the original PARM_DECL
and the pieces used and the same for the return is key here unless
we want to rely on on-the-side data and nothing disrupting things between
that SRA and actual expansion (it would work to call the passes functions
directly from cfgexpand.cc of course - that's what you do as well).

Sorry for the very late reply here.

Thanks,
Richard.

>        || (TREE_CODE (to) == MEM_REF
>           && (REF_REVERSE_STORAGE_ORDER (to)
>               || mem_ref_refers_to_non_mem_p (to)))
> @@ -8909,6 +8912,19 @@ expand_constructor (tree exp, rtx target, enum 
> expand_modifier modifier,
>        && ! mostly_zeros_p (exp))
>      return NULL_RTX;
>
> +  if (target && GET_CODE (target) == PARALLEL && all_zeros_p (exp))
> +    {
> +      int length = XVECLEN (target, 0);
> +      int start = XEXP (XVECEXP (target, 0, 0), 0) ? 0 : 1;
> +      for (int i = start; i < length; i++)
> +       {
> +         rtx dst = XEXP (XVECEXP (target, 0, i), 0);
> +         rtx zero = CONST0_RTX (GET_MODE (dst));
> +         emit_move_insn (dst, zero);
> +       }
> +      return target;
> +    }
> +
>    /* Handle calls that pass values in multiple non-contiguous
>       locations.  The Irix 6 ABI has examples of this.  */
>    if (target == 0 || ! safe_from_p (target, exp, 1)
> @@ -10621,6 +10637,157 @@ stmt_is_replaceable_p (gimple *stmt)
>    return false;
>  }
>
> +/* In the parallel rtx register series REGS, compute the position for given
> +   {BITPOS, BITSIZE}.
> +   START_INDEX, END_INDEX, LEFT_BITS and RIGHT_BITS are computed outputs.  */
> +
> +void
> +query_position_in_parallel (HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
> +                           rtx regs, int &start_index, int &end_index,
> +                           HOST_WIDE_INT &left_bits, HOST_WIDE_INT 
> &right_bits)
> +{
> +  int cur_index = XEXP (XVECEXP (regs, 0, 0), 0) ? 0 : 1;
> +  for (; cur_index < XVECLEN (regs, 0); cur_index++)
> +    {
> +      rtx slot = XVECEXP (regs, 0, cur_index);
> +      HOST_WIDE_INT off = UINTVAL (XEXP (slot, 1)) * BITS_PER_UNIT;
> +      machine_mode mode = GET_MODE (XEXP (slot, 0));
> +      HOST_WIDE_INT size = GET_MODE_BITSIZE (mode).to_constant ();
> +      if (off <= bitpos && off + size > bitpos)
> +       {
> +         start_index = cur_index;
> +         left_bits = bitpos - off;
> +       }
> +      if (off + size >= bitpos + bitsize)
> +       {
> +         end_index = cur_index;
> +         right_bits = off + size - (bitpos + bitsize);
> +         break;
> +       }
> +    }
> +}
> +
> +static rtx
> +extract_sub_member (rtx regs, HOST_WIDE_INT bitpos, HOST_WIDE_INT bitsize,
> +                   tree expr)
> +{
> +  int start_index = -1;
> +  int end_index = -1;
> +  HOST_WIDE_INT left_bits = 0;
> +  HOST_WIDE_INT right_bits = 0;
> +  query_position_in_parallel (bitpos, bitsize, regs, start_index, end_index,
> +                             left_bits, right_bits);
> +
> +  machine_mode expr_mode = TYPE_MODE (TREE_TYPE (expr));
> +  if (end_index > start_index || expr_mode == BLKmode)
> +    {
> +      /* TImode in multi-registers.  */
> +      if (expr_mode == TImode)
> +       {
> +         rtx res = gen_reg_rtx (expr_mode);
> +         HOST_WIDE_INT start;
> +         start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
> +         for (int index = start_index; index <= end_index; index++)
> +           {
> +             rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +             machine_mode mode = GET_MODE (reg);
> +             HOST_WIDE_INT off;
> +             off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start;
> +             rtx sub = simplify_gen_subreg (mode, res, expr_mode, off);
> +             emit_move_insn (sub, reg);
> +           }
> +         return res;
> +       }
> +
> +      /* Vector in multi-registers.  */
> +      if (VECTOR_MODE_P (expr_mode))
> +       {
> +         rtvec vector = rtvec_alloc (end_index - start_index + 1);
> +         machine_mode emode;
> +         emode = GET_MODE (XEXP (XVECEXP (regs, 0, start_index), 0));
> +         for (int index = start_index; index <= end_index; index++)
> +           {
> +             rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +             gcc_assert (emode == GET_MODE (reg));
> +             RTVEC_ELT (vector, index - start_index) = reg;
> +           }
> +         scalar_int_mode imode;
> +         machine_mode vmode;
> +         int nunits = end_index - start_index + 1;
> +         if (!(int_mode_for_mode (emode).exists (&imode)
> +               && mode_for_vector (imode, nunits).exists (&vmode)))
> +           gcc_unreachable ();
> +
> +         insn_code icode;
> +         icode = convert_optab_handler (vec_init_optab, vmode, imode);
> +         rtx res = gen_reg_rtx (vmode);
> +         emit_insn (GEN_FCN (icode) (res, gen_rtx_PARALLEL (vmode, vector)));
> +         if (expr_mode == vmode)
> +           return res;
> +         return simplify_gen_subreg (expr_mode, res, vmode, 0);
> +       }
> +
> +      /* Need multi-registers in a parallel for the access.  */
> +      int num_words = end_index - start_index + 1;
> +      rtx *tmps = XALLOCAVEC (rtx, num_words);
> +
> +      int pos = 0;
> +      HOST_WIDE_INT start;
> +      start = UINTVAL (XEXP (XVECEXP (regs, 0, start_index), 1));
> +      /* Extract whole registers.  */
> +      for (; pos < num_words; pos++)
> +       {
> +         int index = start_index + pos;
> +         rtx reg = XEXP (XVECEXP (regs, 0, index), 0);
> +         machine_mode mode = GET_MODE (reg);
> +         HOST_WIDE_INT off;
> +         off = UINTVAL (XEXP (XVECEXP (regs, 0, index), 1)) - start;
> +         tmps[pos] = gen_rtx_EXPR_LIST (mode, reg, GEN_INT (off));
> +       }
> +
> +      rtx res = gen_rtx_PARALLEL (expr_mode, gen_rtvec_v (pos, tmps));
> +      return res;
> +    }
> +
> +  gcc_assert (end_index == start_index);
> +
> +  /* Just need one reg for the access.  */
> +  if (left_bits == 0 && right_bits == 0)
> +    {
> +      rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
> +      if (GET_MODE (reg) != expr_mode)
> +       reg = gen_lowpart (expr_mode, reg);
> +      return reg;
> +    }
> +
> +  /* Need to extract bitfield part reg for the access.
> +     left_bits != 0 or right_bits != 0 */
> +  rtx reg = XEXP (XVECEXP (regs, 0, start_index), 0);
> +  bool sgn = TYPE_UNSIGNED (TREE_TYPE (expr));
> +  scalar_int_mode imode;
> +  if (!int_mode_for_mode (expr_mode).exists (&imode))
> +    {
> +      gcc_assert (false);
> +      return NULL_RTX;
> +    }
> +
> +  machine_mode mode = GET_MODE (reg);
> +  bool reverse = false;
> +  rtx bfld = extract_bit_field (reg, bitsize, left_bits, sgn, NULL_RTX, mode,
> +                               imode, reverse, NULL);
> +
> +  if (GET_MODE (bfld) != imode)
> +    bfld = gen_lowpart (imode, bfld);
> +
> +  if (expr_mode == imode)
> +    return bfld;
> +
> +  /* expr_mode != imode, e.g. SF != SI.  */
> +  rtx result = gen_reg_rtx (imode);
> +  emit_move_insn (result, bfld);
> +  return gen_lowpart (expr_mode, result);
> +}
> +
>  rtx
>  expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
>                     enum expand_modifier modifier, rtx *alt_rtl,
> @@ -11498,6 +11665,16 @@ expand_expr_real_1 (tree exp, rtx target, 
> machine_mode tmode,
>           = expand_expr_real (tem, tem_target, VOIDmode, tem_modifier, NULL,
>                               true);
>
> +       /* It is scalarizable access on param which is passed by registers.  
> */
> +       if (GET_CODE (op0) == PARALLEL
> +           && (TREE_CODE (tem) == PARM_DECL || VAR_P (tem)))
> +         {
> +           HOST_WIDE_INT pos, size;
> +           size = bitsize.to_constant ();
> +           pos = bitpos.to_constant ();
> +           return extract_sub_member (op0, pos, size, exp);
> +         }
> +
>         /* If the field has a mode, we want to access it in the
>            field's mode, not the computed mode.
>            If a MEM has VOIDmode (external with incomplete type),
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 2a172867fdb..8a9332aaad6 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -362,5 +362,8 @@ extern rtx expr_size (tree);
>
>  extern bool mem_ref_refers_to_non_mem_p (tree);
>  extern bool non_mem_decl_p (tree);
> +extern void query_position_in_parallel (HOST_WIDE_INT, HOST_WIDE_INT, rtx,
> +                                       int &, int &, HOST_WIDE_INT &,
> +                                       HOST_WIDE_INT &);
>
>  #endif /* GCC_EXPR_H */
> diff --git a/gcc/function.cc b/gcc/function.cc
> index afb0b33da9e..518250b2728 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -3107,8 +3107,29 @@ assign_parm_setup_block (struct assign_parm_data_all 
> *all,
>           emit_move_insn (mem, entry_parm);
>         }
>        else
> -       move_block_from_reg (REGNO (entry_parm), mem,
> -                            size_stored / UNITS_PER_WORD);
> +       {
> +         int regno = REGNO (entry_parm);
> +         int nregs = size_stored / UNITS_PER_WORD;
> +         rtx *tmps = XALLOCAVEC (rtx, nregs);
> +         machine_mode mode = word_mode;
> +         HOST_WIDE_INT word_size = GET_MODE_SIZE (mode).to_constant ();
> +         for (int i = 0; i < nregs; i++)
> +           {
> +             rtx reg = gen_rtx_REG (mode, regno + i);
> +             rtx off = GEN_INT (word_size * i);
> +             tmps[i] = gen_rtx_EXPR_LIST (VOIDmode, reg, off);
> +           }
> +
> +         rtx regs = gen_rtx_PARALLEL (BLKmode, gen_rtvec_v (nregs, tmps));
> +         if (scalarizable_aggregate (parm, regs))
> +           {
> +             rtx pseudos = gen_group_rtx (regs);
> +             emit_group_move (pseudos, regs);
> +             stack_parm = pseudos;
> +           }
> +         else
> +           move_block_from_reg (regno, mem, nregs);
> +       }
>      }
>    else if (data->stack_parm == 0 && !TYPE_EMPTY_P (data->arg.type))
>      {
> @@ -3710,7 +3731,15 @@ assign_parms (tree fndecl)
>
>        assign_parm_adjust_stack_rtl (&data);
>
> -      if (assign_parm_setup_block_p (&data))
> +      rtx incoming = DECL_INCOMING_RTL (parm);
> +      if (GET_CODE (incoming) == PARALLEL
> +         && scalarizable_aggregate (parm, incoming))
> +       {
> +         rtx pseudos = gen_group_rtx (incoming);
> +         emit_group_move (pseudos, incoming);
> +         set_parm_rtl (parm, pseudos);
> +       }
> +      else if (assign_parm_setup_block_p (&data))
>         assign_parm_setup_block (&all, parm, &data);
>        else if (data.arg.pass_by_reference || use_register_for_decl (parm))
>         assign_parm_setup_reg (&all, parm, &data);
> @@ -5128,6 +5157,7 @@ expand_function_start (tree subr)
>             {
>               gcc_assert (GET_CODE (hard_reg) == PARALLEL);
>               set_parm_rtl (res, gen_group_rtx (hard_reg));
> +             set_scalar_rtx_for_returns ();
>             }
>         }
>
> diff --git a/gcc/tree-sra.h b/gcc/tree-sra.h
> index f20266c4622..df3071ccf6e 100644
> --- a/gcc/tree-sra.h
> +++ b/gcc/tree-sra.h
> @@ -19,6 +19,82 @@ You should have received a copy of the GNU General Public 
> License
>  along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>
> +struct sra_base_access
> +{
> +  /* Values returned by get_ref_base_and_extent, indicates the
> +     OFFSET, SIZE and BASE of the access.  */
> +  HOST_WIDE_INT offset;
> +  HOST_WIDE_INT size;
> +
> +  /* The context expression of this access.  */
> +  tree expr;
> +
> +  /* Indicates this is a write access.  */
> +  bool write : 1;
> +
> +  /* Indicates if this access is made in reverse storage order.  */
> +  bool reverse : 1;
> +};
> +
> +/* Default template for sra_scan_function.  */
> +
> +struct sra_default_analyzer
> +{
> +  /* Template analyze functions.  */
> +  void analyze_phi (gphi *){};
> +  void pre_analyze_stmt (gimple *){};
> +  void analyze_return (greturn *){};
> +  void analyze_assign (gassign *){};
> +  void analyze_call (gcall *){};
> +  void analyze_asm (gasm *){};
> +  void analyze_default_stmt (gimple *){};
> +};
> +
> +/* Scan function and look for interesting expressions.  */
> +
> +template <typename analyzer>
> +void
> +scan_function (struct function *fun, analyzer &a)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       a.analyze_phi (gsi.phi ());
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> +          gsi_next (&gsi))
> +       {
> +         gimple *stmt = gsi_stmt (gsi);
> +         a.pre_analyze_stmt (stmt);
> +
> +         switch (gimple_code (stmt))
> +           {
> +           case GIMPLE_RETURN:
> +             a.analyze_return (as_a<greturn *> (stmt));
> +             break;
> +
> +           case GIMPLE_ASSIGN:
> +             a.analyze_assign (as_a<gassign *> (stmt));
> +             break;
> +
> +           case GIMPLE_CALL:
> +             a.analyze_call (as_a<gcall *> (stmt));
> +             break;
> +
> +           case GIMPLE_ASM:
> +             a.analyze_asm (as_a<gasm *> (stmt));
> +             break;
> +
> +           default:
> +             a.analyze_default_stmt (stmt);
> +             break;
> +           }
> +       }
> +    }
> +}
> +
>  bool type_internals_preclude_sra_p (tree type, const char **msg);
>
>  /* Return true iff TYPE is stdarg va_list type (which early SRA and IPA-SRA
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
> index d8dafa5481a..7fc801f2612 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -5352,7 +5352,8 @@ track_loc_p (rtx loc, tree expr, poly_int64 offset, 
> bool store_reg_p,
>       because the real and imaginary parts are represented as separate
>       pseudo registers, even if the whole complex value fits into one
>       hard register.  */
> -  if ((paradoxical_subreg_p (mode, DECL_MODE (expr))
> +  if (((DECL_MODE (expr) != BLKmode
> +       && paradoxical_subreg_p (mode, DECL_MODE (expr)))
>         || (store_reg_p
>            && !COMPLEX_MODE_P (DECL_MODE (expr))
>            && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1))
> diff --git a/gcc/testsuite/g++.target/powerpc/pr102024.C 
> b/gcc/testsuite/g++.target/powerpc/pr102024.C
> index 769585052b5..c8995cae707 100644
> --- a/gcc/testsuite/g++.target/powerpc/pr102024.C
> +++ b/gcc/testsuite/g++.target/powerpc/pr102024.C
> @@ -5,7 +5,7 @@
>  // Test that a zero-width bit field in an otherwise homogeneous aggregate
>  // generates a psabi warning and passes arguments in GPRs.
>
> -// { dg-final { scan-assembler-times {\mstd\M} 4 } }
> +// { dg-final { scan-assembler-times {\mmtvsrd\M} 4 } }
>
>  struct a_thing
>  {
> diff --git a/gcc/testsuite/gcc.target/i386/pr20020-2.c 
> b/gcc/testsuite/gcc.target/i386/pr20020-2.c
> index fa8cb2528c5..723f1826630 100644
> --- a/gcc/testsuite/gcc.target/i386/pr20020-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr20020-2.c
> @@ -15,10 +15,15 @@ struct shared_ptr_struct
>  };
>  typedef struct shared_ptr_struct sptr_t;
>
> +void foo (sptr_t *);
> +
>  void
>  copy_sptr (sptr_t *dest, sptr_t src)
>  {
>    *dest = src;
> +
> +  /* Prevent 'src' to be scalarized as registers.  */
> +  foo (&src);
>  }
>
>  /* { dg-final { scan-rtl-dump "\\\(set \\\(reg:TI \[0-9\]*" "expand" } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr108073.c 
> b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> new file mode 100644
> index 00000000000..293bf93fb9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr108073.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +
> +typedef struct DF {double a[4]; short s1; short s2; short s3; short s4; } DF;
> +typedef struct SF {float a[4]; int i1; int i2; } SF;
> +
> +/* { dg-final { scan-assembler-times {\mmtvsrd|mtvsrws\M} 3 {target { 
> has_arch_ppc64 && has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-not {\mlwz\M} {target { has_arch_ppc64 && 
> has_arch_pwr8 } } } } */
> +/* { dg-final { scan-assembler-not {\mlhz\M} {target { has_arch_ppc64 && 
> has_arch_pwr8 } } } } */
> +short  __attribute__ ((noipa)) foo_hi (DF a, int flag){if (flag == 2)return 
> a.s2+a.s3;return 0;}
> +int  __attribute__ ((noipa)) foo_si (SF a, int flag){if (flag == 2)return 
> a.i2+a.i1;return 0;}
> +double __attribute__ ((noipa)) foo_df (DF arg, int flag){if (flag == 
> 2)return arg.a[3];else return 0.0;}
> +float  __attribute__ ((noipa)) foo_sf (SF arg, int flag){if (flag == 
> 2)return arg.a[2]; return 0;}
> +float  __attribute__ ((noipa)) foo_sf1 (SF arg, int flag){if (flag == 
> 2)return arg.a[1];return 0;}
> +
> +DF gdf = {{1.0,2.0,3.0,4.0}, 1, 2, 3, 4};
> +SF gsf = {{1.0f,2.0f,3.0f,4.0f}, 1, 2};
> +
> +int main()
> +{
> +  if (!(foo_hi (gdf, 2) == 5 && foo_si (gsf, 2) == 3 && foo_df (gdf, 2) == 
> 4.0
> +       && foo_sf (gsf, 2) == 3.0 && foo_sf1 (gsf, 2) == 2.0))
> +    __builtin_abort ();
> +  if (!(foo_hi (gdf, 1) == 0 && foo_si (gsf, 1) == 0 && foo_df (gdf, 1) == 0
> +       && foo_sf (gsf, 1) == 0 && foo_sf1 (gsf, 1) == 0))
> +    __builtin_abort ();
> +  return 0;
> +}
> +
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-1.c 
> b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> new file mode 100644
> index 00000000000..4e1f87f7939
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
> @@ -0,0 +1,6 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +
> +typedef struct LARGE {double a[4]; int arr[32];} LARGE;
> +LARGE foo (LARGE a){return a;}
> +/* { dg-final { scan-assembler-times {\mmemcpy\M} 1 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr65421-2.c 
> b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> new file mode 100644
> index 00000000000..8a8e1a0e996
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-2.c
> @@ -0,0 +1,32 @@
> +/* PR target/65421 */
> +/* { dg-options "-O2" } */
> +/* { dg-require-effective-target powerpc_elfv2 } */
> +/* { dg-require-effective-target has_arch_ppc64 } */
> +
> +typedef struct FLOATS
> +{
> +  double a[3];
> +} FLOATS;
> +
> +/* 3 lfd after returns also optimized */
> +/* FLOATS ret_arg_pt (FLOATS *a){return *a;} */
> +
> +/* 3 stfd */
> +void st_arg (FLOATS a, FLOATS *p) {*p = a;}
> +/* { dg-final { scan-assembler-times {\mstfd\M} 3 } } */
> +
> +/* blr */
> +FLOATS ret_arg (FLOATS a) {return a;}
> +
> +typedef struct MIX
> +{
> +  double a[2];
> +  long l;
> +} MIX;
> +
> +/* std 3 param regs to return slot */
> +MIX ret_arg1 (MIX a) {return a;}
> +/* { dg-final { scan-assembler-times {\mstd\M} 3 } } */
> +
> +/* count insns */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 9 } } */
> --
> 2.25.1
>

Re: [PATCH V2] introduce light expander sra

Reply via email to