Richard Biener <rguent...@suse.de> writes:
> This adds the capability to analyze the dependence of mixed
> pointer/array accesses.  The example is from where using a masked
> load/store creates the pointer-based access when an otherwise
> unconditional access is array based.  Other examples would include
> accesses to an array mixed with accesses from inlined helpers
> that work on pointers.
>
> The idea is quite simple and old - analyze the data-ref indices
> as if the reference was pointer-based.  The following change does
> this by changing dr_analyze_indices to work on the indices
> sub-structure and storing an alternate indices substructure in
> each data reference.  That alternate set of indices is analyzed
> lazily by initialize_data_dependence_relation when it fails to
> match-up the main set of indices of two data references.
> initialize_data_dependence_relation is refactored into a head
> and a tail worker and changed to work on one of the indices
> structures and thus away from using DR_* access macros which
> continue to reference the main indices substructure.
>
> There are quite some vectorization and loop distribution opportunities
> unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
> 510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
> 544.nab_r see amendments in what they report with -fopt-info-loop while
> the rest of the specrate set sees no changes there.  Measuring runtime
> for the set where changes were reported reveals nothing off-noise
> besides 511.povray_r which seems to regress slightly for me
> (on a Zen2 machine with -Ofast -march=native).
>
> Changes from the [RFC] version are properly handling bitfields
> that we cannot take the address of and optimization of refs
> that already are MEM_REFs and thus won't see any change.  I've
> also elided changing the set of vect_masked_stores targets in
> favor of explicitely listing avx (but I did not verify if the
> testcase passes on aarch64-sve or amdgcn).
>
> The improves cases like the following from Povray:
>
>    for(i = 0; i < Sphere_Sweep->Num_Modeling_Spheres; i++)
>      {
>         VScaleEq(Sphere_Sweep->Modeling_Sphere[i].Center, Vector[X]);
>         Sphere_Sweep->Modeling_Sphere[i].Radius *= Vector[X];
>      }
>
> where there is a plain array access mixed with abstraction
> using T[] or T* arguments.  That should be a not too uncommon
> situation in the wild.  The loop above is now vectorized and was not
> without the change.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu and I've
> built and run SPEC CPU 2017 successfully.
>
> OK?

Took a while to page this stuff back in :-/

I guess if we're adding alt_indices to the main data_reference,
we'll need to free the access_fns in free_data_ref.  It looked like
the patch as posted might have a memory leak.

Perhaps instead we could use local indices structures for the
alt_indices and pass them in to initialize_data_dependence_relation?
Not that that's very elegant…

> Thanks,
> Richard.
>
> 2021-09-08  Richard Biener  <rguent...@suse.de>
>
>       PR tree-optimization/65206
>       * tree-data-ref.h (struct data_reference): Add alt_indices,
>       order it last.
>       * tree-data-ref.c (dr_analyze_indices): Work on
>       struct indices and get DR_REF as tree.
>       (create_data_ref): Adjust.
>       (initialize_data_dependence_relation): Split into head
>       and tail.  When the base objects fail to match up try
>       again with pointer-based analysis of indices.
>       * tree-vectorizer.c (vec_info_shared::check_datarefs): Do
>       not compare the lazily computed alternate set of indices.
>
>       * gcc.dg/torture/20210916.c: New testcase.
>       * gcc.dg/vect/pr65206.c: Likewise.
> ---
>  gcc/testsuite/gcc.dg/torture/20210916.c |  20 +++
>  gcc/testsuite/gcc.dg/vect/pr65206.c     |  22 +++
>  gcc/tree-data-ref.c                     | 173 ++++++++++++++++--------
>  gcc/tree-data-ref.h                     |   9 +-
>  gcc/tree-vectorizer.c                   |   3 +-
>  5 files changed, 167 insertions(+), 60 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/20210916.c
>  create mode 100644 gcc/testsuite/gcc.dg/vect/pr65206.c
>
> diff --git a/gcc/testsuite/gcc.dg/torture/20210916.c 
> b/gcc/testsuite/gcc.dg/torture/20210916.c
> new file mode 100644
> index 00000000000..0ea6d45e463
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/20210916.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +
> +typedef union tree_node *tree;
> +struct tree_base {
> +  unsigned : 1;
> +  unsigned lang_flag_2 : 1;
> +};
> +struct tree_type {
> +  tree main_variant;
> +};
> +union tree_node {
> +  struct tree_base base;
> +  struct tree_type type;
> +};
> +tree finish_struct_t, finish_struct_x;
> +void finish_struct()
> +{
> +  for (; finish_struct_t->type.main_variant;)
> +    finish_struct_x->base.lang_flag_2 = 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/vect/pr65206.c 
> b/gcc/testsuite/gcc.dg/vect/pr65206.c
> new file mode 100644
> index 00000000000..3b6262622c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr65206.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-additional-options "-fno-trapping-math -fno-allow-store-data-races" 
> } */
> +/* { dg-additional-options "-mavx" { target avx } } */
> +
> +#define N 1024
> +
> +double a[N], b[N];
> +
> +void foo ()
> +{
> +  for (int i = 0; i < N; ++i)
> +    if (b[i] < 3.)
> +      a[i] += b[i];
> +}
> +
> +/* We get a .MASK_STORE because while the load of a[i] does not trap
> +   the store would introduce store data races.  Make sure we still
> +   can handle the data dependence with zero distance.  */
> +
> +/* { dg-final { scan-tree-dump-not "versioning for alias required" "vect" { 
> target { vect_masked_store || avx } } } } */
> +/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" { 
> target { vect_masked_store || avx } } } } */
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index e061baa7c20..3656b7ba80e 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "internal-fn.h"
>  #include "vr-values.h"
>  #include "range-op.h"
> +#include "tree-ssa-loop-ivopts.h"
>  
>  static struct datadep_stats
>  {
> @@ -1300,22 +1301,18 @@ base_supports_access_fn_components_p (tree base)
>     DR, analyzed in LOOP and instantiated before NEST.  */
>  
>  static void
> -dr_analyze_indices (struct data_reference *dr, edge nest, loop_p loop)
> +dr_analyze_indices (struct indices *dri, tree ref, edge nest, loop_p loop)
>  {
> -  vec<tree> access_fns = vNULL;
> -  tree ref, op;
> -  tree base, off, access_fn;
> -
>    /* If analyzing a basic-block there are no indices to analyze
>       and thus no access functions.  */
>    if (!nest)
>      {
> -      DR_BASE_OBJECT (dr) = DR_REF (dr);
> -      DR_ACCESS_FNS (dr).create (0);
> +      dri->base_object = ref;
> +      dri->access_fns.create (0);
>        return;
>      }
>  
> -  ref = DR_REF (dr);
> +  vec<tree> access_fns = vNULL;
>  
>    /* REALPART_EXPR and IMAGPART_EXPR can be handled like accesses
>       into a two element array with a constant index.  The base is
> @@ -1338,8 +1335,8 @@ dr_analyze_indices (struct data_reference *dr, edge 
> nest, loop_p loop)
>      {
>        if (TREE_CODE (ref) == ARRAY_REF)
>       {
> -       op = TREE_OPERAND (ref, 1);
> -       access_fn = analyze_scalar_evolution (loop, op);
> +       tree op = TREE_OPERAND (ref, 1);
> +       tree access_fn = analyze_scalar_evolution (loop, op);
>         access_fn = instantiate_scev (nest, loop, access_fn);
>         access_fns.safe_push (access_fn);
>       }
> @@ -1370,16 +1367,16 @@ dr_analyze_indices (struct data_reference *dr, edge 
> nest, loop_p loop)
>       analyzed nest, add it as an additional independent access-function.  */
>    if (TREE_CODE (ref) == MEM_REF)
>      {
> -      op = TREE_OPERAND (ref, 0);
> -      access_fn = analyze_scalar_evolution (loop, op);
> +      tree op = TREE_OPERAND (ref, 0);
> +      tree access_fn = analyze_scalar_evolution (loop, op);
>        access_fn = instantiate_scev (nest, loop, access_fn);
>        if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
>       {
> -       tree orig_type;
>         tree memoff = TREE_OPERAND (ref, 1);
> -       base = initial_condition (access_fn);
> -       orig_type = TREE_TYPE (base);
> +       tree base = initial_condition (access_fn);
> +       tree orig_type = TREE_TYPE (base);
>         STRIP_USELESS_TYPE_CONVERSION (base);
> +       tree off;
>         split_constant_offset (base, &base, &off);
>         STRIP_USELESS_TYPE_CONVERSION (base);
>         /* Fold the MEM_REF offset into the evolutions initial
> @@ -1424,7 +1421,7 @@ dr_analyze_indices (struct data_reference *dr, edge 
> nest, loop_p loop)
>                                base, memoff);
>         MR_DEPENDENCE_CLIQUE (ref) = MR_DEPENDENCE_CLIQUE (old);
>         MR_DEPENDENCE_BASE (ref) = MR_DEPENDENCE_BASE (old);
> -       DR_UNCONSTRAINED_BASE (dr) = true;
> +       dri->unconstrained_base = true;
>         access_fns.safe_push (access_fn);
>       }
>      }
> @@ -1436,8 +1433,8 @@ dr_analyze_indices (struct data_reference *dr, edge 
> nest, loop_p loop)
>                   build_int_cst (reference_alias_ptr_type (ref), 0));
>      }
>  
> -  DR_BASE_OBJECT (dr) = ref;
> -  DR_ACCESS_FNS (dr) = access_fns;
> +  dri->base_object = ref;
> +  dri->access_fns = access_fns;
>  }
>  
>  /* Extracts the alias analysis information from the memory reference DR.  */
> @@ -1497,7 +1494,7 @@ create_data_ref (edge nest, loop_p loop, tree memref, 
> gimple *stmt,
>  
>    dr_analyze_innermost (&DR_INNERMOST (dr), memref,
>                       nest != NULL ? loop : NULL, stmt);
> -  dr_analyze_indices (dr, nest, loop);
> +  dr_analyze_indices (&dr->indices, DR_REF (dr), nest, loop);
>    dr_analyze_alias (dr);
>  
>    if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -3066,41 +3063,30 @@ access_fn_components_comparable_p (tree ref_a, tree 
> ref_b)
>                            TREE_TYPE (TREE_OPERAND (ref_b, 0)));
>  }
>  
> -/* Initialize a data dependence relation between data accesses A and
> -   B.  NB_LOOPS is the number of loops surrounding the references: the
> -   size of the classic distance/direction vectors.  */
> +/* Initialize a data dependence relation RES in LOOP_NEST.  USE_ALT_INDICES
> +   is true when the main indices of A and B were not comparable so we try 
> again
> +   with alternate indices computed on an indirect reference.  */
>  
>  struct data_dependence_relation *
> -initialize_data_dependence_relation (struct data_reference *a,
> -                                  struct data_reference *b,
> -                                  vec<loop_p> loop_nest)
> +initialize_data_dependence_relation (struct data_dependence_relation *res,
> +                                  vec<loop_p> loop_nest,
> +                                  bool use_alt_indices)
>  {
> -  struct data_dependence_relation *res;
> +  struct data_reference *a = DDR_A (res);
> +  struct data_reference *b = DDR_B (res);
>    unsigned int i;
>  
> -  res = XCNEW (struct data_dependence_relation);
> -  DDR_A (res) = a;
> -  DDR_B (res) = b;
> -  DDR_LOOP_NEST (res).create (0);
> -  DDR_SUBSCRIPTS (res).create (0);
> -  DDR_DIR_VECTS (res).create (0);
> -  DDR_DIST_VECTS (res).create (0);
> -
> -  if (a == NULL || b == NULL)
> +  struct indices *indices_a = &a->indices;
> +  struct indices *indices_b = &b->indices;
> +  if (use_alt_indices)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      if (TREE_CODE (DR_REF (a)) != MEM_REF)
> +     indices_a = &a->alt_indices;
> +      if (TREE_CODE (DR_REF (b)) != MEM_REF)
> +     indices_b = &b->alt_indices;
>      }

BTW, I didn't audit this to check that every DR_* macro access had
been updated :-)

> -
> -  /* If the data references do not alias, then they are independent.  */
> -  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> -    {
> -      DDR_ARE_DEPENDENT (res) = chrec_known;
> -      return res;
> -    }
> -
> -  unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a);
> -  unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b);
> +  unsigned int num_dimensions_a = indices_a->access_fns.length ();
> +  unsigned int num_dimensions_b = indices_b->access_fns.length ();
>    if (num_dimensions_a == 0 || num_dimensions_b == 0)
>      {
>        DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> @@ -3125,9 +3111,9 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>  
>       the a and b accesses have a single ARRAY_REF component reference [0]
>       but have two subscripts.  */
> -  if (DR_UNCONSTRAINED_BASE (a))
> +  if (indices_a->unconstrained_base)
>      num_dimensions_a -= 1;
> -  if (DR_UNCONSTRAINED_BASE (b))
> +  if (indices_b->unconstrained_base)
>      num_dimensions_b -= 1;
>  
>    /* These structures describe sequences of component references in
> @@ -3210,6 +3196,10 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>          B: [3, 4]  (i.e. s.e)  */
>    while (index_a < num_dimensions_a && index_b < num_dimensions_b)
>      {
> +      /* The alternate indices form always has a single dimension
> +      with unconstrained base.  */
> +      gcc_assert (!use_alt_indices);
> +
>        /* REF_A and REF_B must be one of the component access types
>        allowed by dr_analyze_indices.  */
>        gcc_checking_assert (access_fn_component_p (ref_a));
> @@ -3280,11 +3270,12 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>    /* See whether FULL_SEQ ends at the base and whether the two bases
>       are equal.  We do not care about TBAA or alignment info so we can
>       use OEP_ADDRESS_OF to avoid false negatives.  */
> -  tree base_a = DR_BASE_OBJECT (a);
> -  tree base_b = DR_BASE_OBJECT (b);
> +  tree base_a = indices_a->base_object;
> +  tree base_b = indices_b->base_object;
>    bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a
>                     && full_seq.start_b + full_seq.length == num_dimensions_b
> -                   && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b)
> +                   && (indices_a->unconstrained_base
> +                       == indices_b->unconstrained_base)
>                     && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF)
>                     && (types_compatible_p (TREE_TYPE (base_a),
>                                             TREE_TYPE (base_b))
> @@ -3323,7 +3314,7 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>       both lvalues are distinct from the object's declared type.  */
>    if (same_base_p)
>      {
> -      if (DR_UNCONSTRAINED_BASE (a))
> +      if (indices_a->unconstrained_base)
>       full_seq.length += 1;
>      }
>    else
> @@ -3332,8 +3323,42 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>    /* Punt if we didn't find a suitable sequence.  */
>    if (full_seq.length == 0)
>      {
> -      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> -      return res;
> +      if (use_alt_indices
> +       || !param_data_dep_alt_indices

Didn't look like the patch has a definition of this.  Did you decide
to add a --param, or to ditch an earlier one?

> +       || (TREE_CODE (DR_REF (a)) == MEM_REF
> +           && TREE_CODE (DR_REF (b)) == MEM_REF)

Might be a daft question, but do we gain anything by doing this
when neither reference is a MEM_REF?  I.e. could the && be a !=?

Looks OK to me otherwise.

Thanks,
Richard

> +       || may_be_nonaddressable_p (DR_REF (a))
> +       || may_be_nonaddressable_p (DR_REF (b)))
> +     {
> +       /* Fully exhausted possibilities.  */
> +       DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +       return res;
> +     }
> +
> +      /* Try evaluating both DRs as dereferences of pointers.  */
> +      if (!a->alt_indices.base_object
> +       && TREE_CODE (DR_REF (a)) != MEM_REF)
> +     {
> +       tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (a)),
> +                              build1 (ADDR_EXPR, ptr_type_node, DR_REF (a)),
> +                              build_int_cst
> +                                (reference_alias_ptr_type (DR_REF (a)), 0));
> +       dr_analyze_indices (&a->alt_indices, alt_ref,
> +                           loop_preheader_edge (loop_nest[0]),
> +                           loop_containing_stmt (DR_STMT (a)));
> +     }
> +      if (!b->alt_indices.base_object
> +       && TREE_CODE (DR_REF (b)) != MEM_REF)
> +     {
> +       tree alt_ref = build2 (MEM_REF, TREE_TYPE (DR_REF (b)),
> +                              build1 (ADDR_EXPR, ptr_type_node, DR_REF (b)),
> +                              build_int_cst
> +                                (reference_alias_ptr_type (DR_REF (b)), 0));
> +       dr_analyze_indices (&b->alt_indices, alt_ref,
> +                           loop_preheader_edge (loop_nest[0]),
> +                           loop_containing_stmt (DR_STMT (b)));
> +     }
> +      return initialize_data_dependence_relation (res, loop_nest, true);
>      }
>  
>    if (!same_base_p)
> @@ -3381,8 +3406,8 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>        struct subscript *subscript;
>  
>        subscript = XNEW (struct subscript);
> -      SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i);
> -      SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i);
> +      SUB_ACCESS_FN (subscript, 0) = indices_a->access_fns[full_seq.start_a 
> + i];
> +      SUB_ACCESS_FN (subscript, 1) = indices_b->access_fns[full_seq.start_b 
> + i];
>        SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known ();
>        SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known ();
>        SUB_LAST_CONFLICT (subscript) = chrec_dont_know;
> @@ -3393,6 +3418,40 @@ initialize_data_dependence_relation (struct 
> data_reference *a,
>    return res;
>  }
>  
> +/* Initialize a data dependence relation between data accesses A and
> +   B.  NB_LOOPS is the number of loops surrounding the references: the
> +   size of the classic distance/direction vectors.  */
> +
> +struct data_dependence_relation *
> +initialize_data_dependence_relation (struct data_reference *a,
> +                                  struct data_reference *b,
> +                                  vec<loop_p> loop_nest)
> +{
> +  data_dependence_relation *res = XCNEW (struct data_dependence_relation);
> +  DDR_A (res) = a;
> +  DDR_B (res) = b;
> +  DDR_LOOP_NEST (res).create (0);
> +  DDR_SUBSCRIPTS (res).create (0);
> +  DDR_DIR_VECTS (res).create (0);
> +  DDR_DIST_VECTS (res).create (0);
> +
> +  if (a == NULL || b == NULL)
> +    {
> +      DDR_ARE_DEPENDENT (res) = chrec_dont_know;
> +      return res;
> +    }
> +
> +  /* If the data references do not alias, then they are independent.  */
> +  if (!dr_may_alias_p (a, b, loop_nest.exists () ? loop_nest[0] : NULL))
> +    {
> +      DDR_ARE_DEPENDENT (res) = chrec_known;
> +      return res;
> +    }
> +
> +  return initialize_data_dependence_relation (res, loop_nest, false);
> +}
> +
> +
>  /* Frees memory used by the conflict function F.  */
>  
>  static void
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index 685f33d85ae..74f579c9f3f 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -166,14 +166,19 @@ struct data_reference
>       and runs to completion.  */
>    bool is_conditional_in_stmt;
>  
> +  /* Alias information for the data reference.  */
> +  struct dr_alias alias;
> +
>    /* Behavior of the memory reference in the innermost loop.  */
>    struct innermost_loop_behavior innermost;
>  
>    /* Subscripts of this data reference.  */
>    struct indices indices;
>  
> -  /* Alias information for the data reference.  */
> -  struct dr_alias alias;
> +  /* Alternate subscripts initialized lazily and used by data-dependence
> +     analysis only when the main indices of two DRs are not comparable.
> +     Keep last to keep vec_info_shared::check_datarefs happy.  */
> +  struct indices alt_indices;
>  };
>  
>  #define DR_STMT(DR)                (DR)->stmt
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 3aa3e2a6783..20daa31187d 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -507,7 +507,8 @@ vec_info_shared::check_datarefs ()
>      return;
>    gcc_assert (datarefs.length () == datarefs_copy.length ());
>    for (unsigned i = 0; i < datarefs.length (); ++i)
> -    if (memcmp (&datarefs_copy[i], datarefs[i], sizeof (data_reference)) != 
> 0)
> +    if (memcmp (&datarefs_copy[i], datarefs[i],
> +             offsetof (data_reference, alt_indices)) != 0)
>        gcc_unreachable ();
>  }

Reply via email to