Richard Biener <richard.guent...@gmail.com> writes: > On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford > <richard.sandif...@linaro.org> wrote: >> Richard Biener <richard.guent...@gmail.com> writes: >>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>>> <richard.sandif...@linaro.org> wrote: >>>>> This patch tries to calculate conservatively-correct distance >>>>> vectors for two references whose base addresses are not the same. >>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>>> isn't guaranteed to occur. >>>>> >>>>> The motivating example is: >>>>> >>>>> struct s { int x[8]; }; >>>>> void >>>>> f (struct s *a, struct s *b) >>>>> { >>>>> for (int i = 0; i < 8; ++i) >>>>> a->x[i] += b->x[i]; >>>>> } >>>>> >>>>> in which the "a" and "b" accesses are either independent or have a >>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>>> prevents vectorisation, so we can vectorise without an alias check. >>>>> >>>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>>> >>>>> void >>>>> f (int a[][8], struct b[][8]) >>>>> { >>>>> for (int i = 0; i < 8; ++i) >>>>> a[0][i] += b[0][i]; >>>>> } >>>>> >>>>> I think this is valid because C11 6.7.6.2/6 says: >>>>> >>>>> For two array types to be compatible, both shall have compatible >>>>> element types, and if both size specifiers are present, and are >>>>> integer constant expressions, then both size specifiers shall have >>>>> the same constant value. >>>>> >>>>> So if we access an array through an int (*)[8], it must have type X[8] >>>>> or X[], where X is compatible with int. It doesn't seem possible in >>>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>>> >>>>> However, Richard B said that (at least in gimple) we support arbitrary >>>>> overlap of arrays and allow arrays to be accessed with different >>>>> dimensionality. There are examples of this in PR50067. I've therefore >>>>> only handled references that end in a structure field access. >>>>> >>>>> There are two ways of handling these dependences in the vectoriser: >>>>> use them to limit VF, or check at runtime as before. I've gone for >>>>> the approach of checking at runtime if we can, to avoid limiting VF >>>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>>> aren't allowed. >>>>> >>>>> The patch tests whether we queued an alias check with a dependence >>>>> distance of X and then picked a VF <= X, in which case it's safe to >>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>>> be called twice with different VF for the same loop, it's no longer >>>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>>> comp_alias_ddrs to check whether versioning is necessary. >>>>> >>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>>> >>>> You seem to do your "fancy" thing but also later compute the old >>>> base equality anyway (for same_base_p). It looks to me for this >>>> case the new fancy code can be simply skipped, keeping num_dimensions >>>> as before? >>>> >>>> + /* Try to approach equal type sizes. */ >>>> + if (!COMPLETE_TYPE_P (type_a) >>>> + || !COMPLETE_TYPE_P (type_b) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>>> + break; >>>> >>>> ah, interesting idea to avoid a quadratic search. Note that you should >>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>>> as they are used for type-punning. >> >> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, >> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that >> dr_analyze_indices allows, so I think we safe in terms of the tree codes. > > Yeah. I think we need to document that we should have a 1:1 match here.
OK, I added that to the comments and also added an access_fn_component_p that we can assert on. >>>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>>> ARRAY_REFs - but I also see there: >>>> >>>> /* ??? We cannot simply use the type of operand #0 of the refs here >>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>>> for common blocks instead of using unions like everyone else. */ >>>> tree type1 = DECL_CONTEXT (field1); >>>> tree type2 = DECL_CONTEXT (field2); >>>> >>>> so you probably can't simply use TREE_TYPE (outer_ref) for type >>>> compatibility. >>>> You also may not use types_compatible_p here as for LTO that is _way_ too >>>> lax for aggregates. The above uses >>>> >>>> /* We cannot disambiguate fields in a union or qualified union. */ >>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>>> return false; >>>> >>>> so you should also bail out on unions here, rather than the check you do >>>> later. >> >> The loop stops before we get to a union, so I think "only" the RECORD_TYPE >> COMPONENT_REF handling is a potential problem. Does this mean that >> I should use the nonoverlapping_component_refs_of_decl_p code: >> >> tree field1 = TREE_OPERAND (ref1, 1); >> tree field2 = TREE_OPERAND (ref2, 1); >> >> /* ??? We cannot simply use the type of operand #0 of the refs here >> as the Fortran compiler smuggles type punning into COMPONENT_REFs >> for common blocks instead of using unions like everyone else. */ >> tree type1 = DECL_CONTEXT (field1); >> tree type2 = DECL_CONTEXT (field2); >> >> /* We cannot disambiguate fields in a union or qualified union. */ >> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >> return false; >> >> if (field1 != field2) >> { >> /* A field and its representative need to be considered the >> same. */ >> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 >> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) >> return false; >> /* Different fields of the same record type cannot overlap. >> ??? Bitfields can overlap at RTL level so punt on them. */ >> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) >> return false; >> return true; >> } >> >> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p >> during the new loop? > > Yes. OTOH you want to "match" while the above disambiguates. So it means > you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL > equality (which should be the same in the end). The RTL concern > should not matter > here. The attached patch adds an access_fn_components_comparable_p helper function that checks whether the DECL_CONTEXTs are the same. >> And test for this as well as unions in the outer >> references? > > So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, > and you (should) stop the ref walk at DR_BASE_OBJECT. I was just thinking that if the Fortran front-end has cases in which TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1)) for a COMPONENT_REF, should we treat that as equivalent to a union access in ref_contains_union_access_p? But I'm not sure that's necessary after all. > The dr_analyze_indices code is also somewhat fishy in that it simply > ignores everything below unhandled component-refs even if there are > indices involved (and it gets away with this because dependence > analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test > in case it is sth like a[i].union for example ... hopefully ...). I think this is what you meant, but: I don't think the base object itself can be a union, because we need at least one component reference for the DR, and don't accept COMPONENT_REFs for unions as access functions. So if the base involves a union, the base would also need to have a COMPONENT_REF that selects a particular member of that union. And yeah, before the patch we did allow a dependence distance to be calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do after the patch), on the basis that a[i].union.f refers to the same object in both cases. >>>> You seem to rely on getting an access_fn entry for each >>>> handled_component_p. >>>> It looks like this is the case -- we even seem to stop at unions >>>> (with the same >>>> fortran "issue"). I'm not sure that's the best thing to do but you >>>> rely on that. >> >> Yeah, the loop is deliberately limited to the components associated with >> an access_fn. I did wonder at first whether dr_analyze_indices should >> store the original component reference trees for each access function. >> That would make things simpler and more explicit, but would also eat up >> more memory. Things like object_address_invariant_in_loop_p rely on the >> access_fns in the same way that the loop in the patch does. > > in fact it fails to handle ARRAY_RANGE_REFs ... Yeah, the whole file seems to ignore those. What kind of code would benefit? >>>> I don't understand the looping, it needs more comments. You seem to be >>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>>> is how many compatible refs you found on the way (with incompatible ones >>>> not counting?!). What about an inner varying array of structs? >>>> This seems to >>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>>> b[i].s.b[i].j? >> >> I'll try to improve the comments. But the idea is that both sequences are >> as long as possible, while that still gives compatible types. If there is >> more than one such sequence, we pick the one nearest the base. >> >> So in your example, the access functions would be: >> >> 0 1 2 3 4 >> a: .j [i] .b .s [i] >> >> 0 1 2 3 4 5 >> b: __real .j [i] .b .s [i] >> >> If a and b are pointers, the final access functions would be >> unconstrained base accesses, so we'd end up with: >> >> a: [0, 3] >> b: [1, 4] >> >> for both sequences. >> >>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>>> conveniently start from the other >>>> end of the ref here. >>> >>> That said, for the motivational cases we either have one ref having >>> more dimensions than the other (the __real vs. full complex access) or >>> they have the same number of dimensions (and no access fn for the >>> base). >>> >>> For the first case we should simply "drop" access_fns of the larger >>> dimensional ref (from the start, plus outer component refs) up to the >>> point the number of dimensions are equal. >> >> Yeah, that's what happens for your example. But if we had: >> >> a[i].s.c.d >> __real b[i].s.b[i].j >> >> (where d is the same type as the real component) then the access >> functions would be: >> >> 0 1 2 3 >> a: .d .c .s [i] >> >> 0 1 2 3 4 5 >> b: __real .j [i] .b .s [i] >> >> Comparing the a0/b2 column doesn't make sense, because one's an array >> and the other is a structure. In this case the sequence we care about is: >> >> a: [1, 3] >> b: [3, 5] >> >> which is what the loop gives. The a1/b3 column is the one that proves >> there's no dependence. >> >>> Then we have the case of >>> >>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >>> >>> where we have to punt. >>> >>> Then we have the case of >>> >>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>> >>> which is where the new code should kick in to see if we can drop access_fns >>> from the other end (as unanalyzable but either having distance zero or not >>> aliased because of TBAA). >>> >>> At least your testcases suggest you do not want to handle >>> >>> struct s { int x[N]; }; >>> struct r { struct s s; }; >>> f (struct s *a, struct r *b) >>> { >>> for (i = 0; i < N; ++i) >>> a->s.x[i] = b->x[i]; >>> } >>> >>> ? >>> >>> With this example your loop which seems to search for a "common" >>> sequence in (different) midst of the reference trees makes more sense >>> (still that loop is awkward to understand). >> >> Yeah, I want to handle that too, just hadn't thought of it as a specific >> testcase. The code does give the expected dependence distance of 0. > > Ok. > > I think the patch is reasonable, maybe the loop can be restructured / > simplified a bit and handling of the union case for example be done > first (by looking at DR_BASE_OBJECT). I still prefer doing the loop first and keeping the "same base" check together as a single condition, since it means that we're analysing the reference in a single direction (DR_REF to base) rather than jumping around. And unequal bases should be more common that equal ones. I think both orders involve doing potentially redundant work. The current order tends towards doing redundant work for union accesses and !flag_strict_aliasing, but they should be the less common cases. How does this look? Changes since v1: - Added access_fn_component_p to check for valid access function components. - Added access_fn_components_comparable_p instead of using types_compatibloe_p directly. - Added more commentary. - Added local structures to represent the sequence, so that it's more obvious which variables are temporaries and which aren't. - Added the test above to vect-alias-check-3.c. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Thanks, Richard 2017-05-18 Richard Sandiford <richard.sandif...@linaro.org> gcc/ * tree-data-ref.h (subscript): Add access_fn field. (data_dependence_relation): Add could_be_independent_p. (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. (same_access_functions): Move to tree-data-ref.c. * tree-data-ref.c (ref_contains_union_access_p): New function. (access_fn_component_p): Likewise. (access_fn_components_comparable_p): Likewise. (dr_analyze_indices): Add a comment that this code needs to be kept in sync with access_fn_component_p. (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (constant_access_functions): Likewise. (add_other_self_distances): Likewise. (same_access_functions): Likewise. (Moved from tree-data-ref.h.) (initialize_data_dependence_relation): Use XCNEW and remove explicit zeroing of DDR_REVERSED_P. Look for a subsequence of access functions that have the same type. Allow the subsequence to end with different bases in some circumstances. Record the chosen access functions in SUB_ACCESS_FN. (build_classic_dist_vector_1): Replace ddr_a and ddr_b with a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (subscript_dependence_tester_1): Likewise dra and drb. (build_classic_dist_vector): Update calls accordingly. (subscript_dependence_tester): Likewise. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check DDR_COULD_BE_INDEPENDENT_P. * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded distance vectors if that fails. (dependence_distance_ge_vf): New function. (vect_prune_runtime_alias_test_list): Use it. Don't clear LOOP_VINFO_MAY_ALIAS_DDRS. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-3.c: New test. * gcc.dg/vect/vect-alias-check-4.c: Likewise. * gcc.dg/vect/vect-alias-check-5.c: Likewise. Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-05-04 11:36:51.157328631 +0100 +++ gcc/tree-data-ref.h 2017-05-18 07:51:50.871904726 +0100 @@ -191,6 +191,9 @@ struct conflict_function struct subscript { + /* The access functions of the two references. */ + tree access_fn[2]; + /* A description of the iterations for which the elements are accessed twice. */ conflict_function *conflicting_iterations_in_a; @@ -209,6 +212,7 @@ struct subscript typedef struct subscript *subscript_p; +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict @@ -264,6 +268,33 @@ struct data_dependence_relation /* Set to true when the dependence relation is on the same data access. */ bool self_reference_p; + + /* True if the dependence described is conservatively correct rather + than exact, and if it is still possible for the accesses to be + conditionally independent. For example, the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a->f[i] += b->f[i]; + + conservatively have a distance vector of (0), for the case in which + a == b, but the accesses are independent if a != b. Similarly, + the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a[0].f[i] += b[i].f[i]; + + conservatively have a distance vector of (0), but they are indepenent + when a != b + i. In contrast, the references in: + + struct s *a; + for (int i = 0; i < n; ++i) + a->f[i] += a->f[i]; + + have the same distance vector of (0), but the accesses can never be + independent. */ + bool could_be_independent_p; }; typedef struct data_dependence_relation *ddr_p; @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ #define DDR_DIST_VECT(DDR, I) \ DDR_DIST_VECTS (DDR)[I] #define DDR_REVERSED_P(DDR) (DDR)->reversed_p +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p bool dr_analyze_innermost (struct data_reference *, struct loop *); @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data return false; return true; -} - -/* Return true when the DDR contains two data references that have the - same access functions. */ - -static inline bool -same_access_functions (const struct data_dependence_relation *ddr) -{ - unsigned i; - - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), - DR_ACCESS_FN (DDR_B (ddr), i))) - return false; - - return true; } /* Returns true when all the dependences are computable. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-05-18 07:51:26.126377691 +0100 +++ gcc/tree-data-ref.c 2017-05-18 07:51:50.871904726 +0100 @@ -123,8 +123,7 @@ Software Foundation; either version 3, o } dependence_stats; static bool subscript_dependence_tester_1 (struct data_dependence_relation *, - struct data_reference *, - struct data_reference *, + unsigned int, unsigned int, struct loop *); /* Returns true iff A divides B. */ @@ -144,6 +143,21 @@ int_divides_p (int a, int b) return ((b % a) == 0); } +/* Return true if reference REF contains a union access. */ + +static bool +ref_contains_union_access_p (tree ref) +{ + while (handled_component_p (ref)) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) + return true; + } + return false; +} + /* Dump into FILE all the data references from DATAREFS. */ @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out unsigned int i; struct loop *loopi; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + subscript *sub; + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { fprintf (outf, " access_fn_A: "); - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); fprintf (outf, " access_fn_B: "); - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); + dump_subscript (outf, sub); } fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere return true; } +/* Return true if OP is a valid component reference for a DR access + function. This accepts a subset of what handled_component_p accepts. */ + +static bool +access_fn_component_p (tree op) +{ + switch (TREE_CODE (op)) + { + case REALPART_EXPR: + case IMAGPART_EXPR: + case ARRAY_REF: + return true; + + case COMPONENT_REF: + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; + + default: + return false; + } +} + /* Determines the base object and the list of indices of memory reference DR, analyzed in LOOP and instantiated in loop nest NEST. */ @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc access_fns.safe_push (integer_one_node); } - /* Analyze access functions of dimensions we know to be independent. */ + /* Analyze access functions of dimensions we know to be independent. + The list of component references handled here should be kept in + sync with access_fn_component_p. */ while (handled_component_p (ref)) { if (TREE_CODE (ref) == ARRAY_REF) @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere return refs_may_alias_p (addr_a, addr_b); } +/* REF_A and REF_B both satisfy access_fns_comparable_p. Return true + if it is meaningful to compare their associated access functions + when checking for dependencies. */ + +static bool +access_fn_components_comparable_p (tree ref_a, tree ref_b) +{ + if (TREE_CODE (ref_a) != TREE_CODE (ref_b)) + return false; + + if (TREE_CODE (ref_a) == COMPONENT_REF) + /* ??? We cannot simply use the type of operand #0 of the refs here as + the Fortran compiler smuggles type punning into COMPONENT_REFs. + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); + + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), + TREE_TYPE (TREE_OPERAND (ref_b, 0))); +} + /* Initialize a data dependence relation between data accesses A and B. NB_LOOPS is the number of loops surrounding the references: the size of the classic distance/direction vectors. */ @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str struct data_dependence_relation *res; unsigned int i; - res = XNEW (struct data_dependence_relation); + res = XCNEW (struct data_dependence_relation); DDR_A (res) = a; DDR_B (res) = b; DDR_LOOP_NEST (res).create (0); - DDR_REVERSED_P (res) = false; DDR_SUBSCRIPTS (res).create (0); DDR_DIR_VECTS (res).create (0); DDR_DIST_VECTS (res).create (0); @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str return res; } - /* The case where the references are exactly the same. */ - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + if (num_dimensions_a == 0 || num_dimensions_b == 0) { - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], - DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* For unconstrained bases, the root (highest-indexed) subscript + describes a variation in the base of the original DR_REF rather + than a component access. We have no type that accurately describes + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* + applying this subscript) so limit the search to the last real + component access. + + E.g. for: + + void + f (int a[][8], int b[][8]) + { + for (int i = 0; i < 8; ++i) + a[i * 2][0] = b[i][0]; + } + + the a and b accesses have a single ARRAY_REF component reference [0] + but have two subscripts. */ + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions_a -= 1; + if (DR_UNCONSTRAINED_BASE (b)) + num_dimensions_b -= 1; + + /* These structures describe sequences of component references in + DR_REF (A) and DR_REF (B). Each component reference is tied to a + specific access function. */ + struct { + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher + indices. In C notation, these are the indices of the rightmost + component references; e.g. for a sequence .b.c.d, the start + index is for .d. */ + unsigned int start_a; + unsigned int start_b; + + /* The sequence contains LENGTH consecutive access functions from + each DR. */ + unsigned int length; + + /* The enclosing objects for the A and B sequences respectively, + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ + tree object_a; + tree object_b; + } full_seq = {}, struct_seq = {}; + + /* Before each iteration of the loop: + + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ + unsigned int index_a = 0; + unsigned int index_b = 0; + tree ref_a = DR_REF (a); + tree ref_b = DR_REF (b); + + /* Now walk the component references from the final DR_REFs back up to + the enclosing base objects. Each component reference corresponds + to one access function in the DR, with access function 0 being for + the final DR_REF and the highest-indexed access function being the + one that is applied to the base of the DR. + + Look for a sequence of component references whose access functions + are comparable (see access_fn_components_comparable_p). If more + than one such sequence exists, pick the one nearest the base + (which is the leftmost sequence in C notation). Store this sequence + in FULL_SEQ. + + For example, if we have: + + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; + + A: a[0][i].s.c.d + B: __real b[0][i].s.e[i].f + + (where d is the same type as the real component of f) then the access + functions would be: + + 0 1 2 3 + A: .d .c .s [i] + + 0 1 2 3 4 5 + B: __real .f [i] .e .s [i] + + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF + and [i] is an ARRAY_REF. However, the A1/B3 column contains two + COMPONENT_REF accesses for struct bar, so is comparable. Likewise + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, + so is comparable. The A3/B5 column contains two ARRAY_REFs that + index foo[10] arrays, so is again comparable. The sequence is + therefore: + + A: [1, 3] (i.e. [i].s.c) + B: [3, 5] (i.e. [i].s.e) + + Also look for sequences of component references whose access + functions are comparable and whose enclosing objects have the same + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above + example, STRUCT_SEQ would be: + + A: [1, 2] (i.e. s.c) + B: [3, 4] (i.e. s.e) */ + while (index_a < num_dimensions_a && index_b < num_dimensions_b) + { + /* REF_A and REF_B must be one of the component access types + allowed by dr_analyze_indices. */ + gcc_checking_assert (access_fn_component_p (ref_a)); + gcc_checking_assert (access_fn_component_p (ref_b)); + + /* Get the immediately-enclosing objects for REF_A and REF_B, + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) + and DR_ACCESS_FN (B, INDEX_B). */ + tree object_a = TREE_OPERAND (ref_a, 0); + tree object_b = TREE_OPERAND (ref_b, 0); + + tree type_a = TREE_TYPE (object_a); + tree type_b = TREE_TYPE (object_b); + if (access_fn_components_comparable_p (ref_a, ref_b)) + { + /* This pair of component accesses is comparable for dependence + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ + if (full_seq.start_a + full_seq.length != index_a + || full_seq.start_b + full_seq.length != index_b) + { + /* The accesses don't extend the current sequence, + so start a new one here. */ + full_seq.start_a = index_a; + full_seq.start_b = index_b; + full_seq.length = 0; + } + + /* Add this pair of references to the sequence. */ + full_seq.length += 1; + full_seq.object_a = object_a; + full_seq.object_b = object_b; + + /* If the enclosing objects are structures (and thus have the + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ + if (TREE_CODE (type_a) == RECORD_TYPE) + struct_seq = full_seq; + + /* Move to the next containing reference for both A and B. */ + ref_a = object_a; + ref_b = object_b; + index_a += 1; + index_b += 1; + continue; + } + + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; + + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); + if (size_a <= size_b) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + index_a += 1; + ref_a = object_a; + } + if (size_b <= size_a) + { + index_b += 1; + ref_b = object_b; } - DDR_AFFINE_P (res) = true; - DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); - DDR_LOOP_NEST (res) = loop_nest; - DDR_INNER_LOOP (res) = 0; - DDR_SELF_REFERENCE (res) = true; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) - { - struct subscript *subscript; - - subscript = XNEW (struct subscript); - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; - SUB_DISTANCE (subscript) = chrec_dont_know; - DDR_SUBSCRIPTS (res).safe_push (subscript); - } - return res; } - /* If the references do not access the same object, we do not know - whether they alias or not. We do not care about TBAA or alignment - info so we can use OEP_ADDRESS_OF to avoid false negatives. - But the accesses have to use compatible types as otherwise the - built indices would not match. */ - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), - TREE_TYPE (DR_BASE_OBJECT (b)))) + /* See whether FULL_SEQ ends at the base and whether the two bases + are equal. We do not care about TBAA or alignment info so we can + use OEP_ADDRESS_OF to avoid false negatives. */ + tree base_a = DR_BASE_OBJECT (a); + tree base_b = DR_BASE_OBJECT (b); + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a + && full_seq.start_b + full_seq.length == num_dimensions_b + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) + && types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b)) + && (!loop_nest.exists () + || (object_address_invariant_in_loop_p + (loop_nest[0], base_a)))); + + /* If the bases are the same, we can include the base variation too. + E.g. the b accesses in: + + for (int i = 0; i < n; ++i) + b[i + 4][0] = b[i][0]; + + have a definite dependence distance of 4, while for: + + for (int i = 0; i < n; ++i) + a[i + 4][0] = b[i][0]; + + the dependence distance depends on the gap between a and b. + + If the bases are different then we can only rely on the sequence + rooted at a structure access, since arrays are allowed to overlap + arbitrarily and change shape arbitrarily. E.g. we treat this as + valid code: + + int a[256]; + ... + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; + + where two lvalues with the same int[4][3] type overlap, and where + both lvalues are distinct from the object's declared type. */ + if (same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (DR_UNCONSTRAINED_BASE (a)) + full_seq.length += 1; } + else + full_seq = struct_seq; - /* If the base of the object is not invariant in the loop nest, we cannot - analyze it. TODO -- in fact, it would suffice to record that there may - be arbitrary dependences in the loops where the base object varies. */ - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + /* Punt if we didn't find a suitable sequence. */ + if (full_seq.length == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; return res; } - /* If the number of dimensions of the access to not agree we can have - a pointer access to a component of the array element type and an - array access while the base-objects are still the same. Punt. */ - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) + if (!same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + /* Partial overlap is possible for different bases when strict aliasing + is not in effect. It's also possible if either base involves a union + access; e.g. for: + + struct s1 { int a[2]; }; + struct s2 { struct s1 b; int c; }; + struct s3 { int d; struct s1 e; }; + union u { struct s2 f; struct s3 g; } *p, *q; + + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at + "p->g.e" (base "p->g") and might partially overlap the s1 at + "q->g.e" (base "q->g"). */ + if (!flag_strict_aliasing + || ref_contains_union_access_p (full_seq.object_a) + || ref_contains_union_access_p (full_seq.object_b)) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + DDR_COULD_BE_INDEPENDENT_P (res) = true; } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); + DDR_SUBSCRIPTS (res).create (full_seq.length); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = 0; i < full_seq.length; ++i) { struct subscript *subscript; subscript = XNEW (struct subscript); + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende } /* Return false when fail to represent the data dependence as a - distance vector. INIT_B is set to true when a component has been + distance vector. A_INDEX is the index of the first reference + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the + second reference. INIT_B is set to true when a component has been added to the distance vector DIST_V. INDEX_CARRY is then set to the index in DIST_V that carries the dependence. */ static bool build_classic_dist_vector_1 (struct data_dependence_relation *ddr, - struct data_reference *ddr_a, - struct data_reference *ddr_b, + unsigned int a_index, unsigned int b_index, lambda_vector dist_v, bool *init_b, int *index_carry) { @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data return false; } - access_fn_a = DR_ACCESS_FN (ddr_a, i); - access_fn_b = DR_ACCESS_FN (ddr_b, i); + access_fn_a = SUB_ACCESS_FN (subscript, a_index); + access_fn_b = SUB_ACCESS_FN (subscript, b_index); if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data constant_access_functions (const struct data_dependence_relation *ddr) { unsigned i; + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) return false; return true; @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de lambda_vector dist_v; unsigned i; int index_carry = DDR_NB_LOOPS (ddr); + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); + tree access_fun = SUB_ACCESS_FN (sub, 0); if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) { @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de return; } - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) add_multivariate_self_dist (ddr, access_fun); @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d } } +/* Return true when the DDR contains two data references that have the + same access functions. */ + +static inline bool +same_access_functions (const struct data_dependence_relation *ddr) +{ + unsigned i; + subscript *sub; + + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), + SUB_ACCESS_FN (sub, 1))) + return false; + + return true; +} + /* Compute the classic per loop distance vector. DDR is the data dependence relation to build a vector from. Return false when fail to represent the data dependence as a distance vector. */ @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d } dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), - dist_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) return false; /* Save the distance vector if we initialized one. */ @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), - loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - save_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, + &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d { lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), - DDR_A (ddr), loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - opposite_v, &init_b, + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, &index_carry)) return false; @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de } } -/* Helper function. Returns true when there is a dependence between - data references DRA and DRB. */ +/* Helper function. Returns true when there is a dependence between the + data references. A_INDEX is the index of the first reference (0 for + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ static bool subscript_dependence_tester_1 (struct data_dependence_relation *ddr, - struct data_reference *dra, - struct data_reference *drb, + unsigned int a_index, unsigned int b_index, struct loop *loop_nest) { unsigned int i; @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da { conflict_function *overlaps_a, *overlaps_b; - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), - DR_ACCESS_FN (drb, i), + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), + SUB_ACCESS_FN (subscript, b_index), &overlaps_a, &overlaps_b, &last_conflicts, loop_nest); @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da subscript_dependence_tester (struct data_dependence_relation *ddr, struct loop *loop_nest) { - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) dependence_stats.num_dependence_dependent++; compute_subscript_distance (ddr); Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:26.127377591 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:50.871904726 +0100 @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * refb = (struct mem_ref *) DDR_B (dep)->aux; if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know + || DDR_COULD_BE_INDEPENDENT_P (dep) || DDR_NUM_DIST_VECTS (dep) == 0) { /* If the dependence cannot be analyzed, assume that there might be Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-05-18 07:51:26.128377491 +0100 +++ gcc/tree-vectorizer.h 2017-05-18 07:51:50.872904626 +0100 @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ ((L)->may_misalign_stmts.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ - ((L)->may_alias_ddrs.length () > 0) + ((L)->comp_alias_ddrs.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) #define LOOP_REQUIRES_VERSIONING(L) \ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-05-18 07:51:23.307659382 +0100 +++ gcc/tree-vect-data-refs.c 2017-05-18 07:51:50.872904626 +0100 @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct } loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); + + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) + /* For dependence distances of 2 or more, we have the option of + limiting VF or checking for an alias at runtime. Prefer to check + at runtime if we can, to avoid limiting the VF unnecessarily when + the bases are in fact independent. + + Note that the alias checks will be removed if the VF ends up + being small enough. */ + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + int dist = dist_v[loop_depth]; + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) + { + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) + return false; + break; + } + } + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * return false; } +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH + in DDR is >= VF. */ + +static bool +dependence_distance_ge_vf (data_dependence_relation *ddr, + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) +{ + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE + || DDR_NUM_DIST_VECTS (ddr) == 0) + return false; + + /* If the dependence is exact, we should have limited the VF instead. */ + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); + + unsigned int i; + lambda_vector dist_v; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + HOST_WIDE_INT dist = dist_v[loop_depth]; + if (dist != 0 + && !(dist > 0 && DDR_REVERSED_P (ddr)) + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) + return false; + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "dependence distance between "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); + dump_printf (MSG_NOTE, " is >= VF\n"); + } + + return true; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop comp_alias_ddrs.create (may_alias_ddrs.length ()); + unsigned int loop_depth + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, + LOOP_VINFO_LOOP_NEST (loop_vinfo)); + /* First, we collect all data ref pairs for aliasing checks. */ FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop tree segment_length_a, segment_length_b; gimple *stmt_a, *stmt_b; + /* Ignore the alias if the VF we chose ended up being no greater + than the dependence distance. */ + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) + continue; + dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop return false; } - /* All alias checks have been resolved at compilation time. */ - if (!comp_alias_ddrs.length ()) - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return true; } Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,112 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N + 1]; }; +struct t { struct s x[N + 1]; }; +struct u { int x[N + 1]; int y; }; +struct v { struct s s; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i]; +} + +void +f2 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i]; +} + +void +f3 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i]; +} + +void +f4 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i]; +} + +void +f5 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i + 1]; +} + +void +f6 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i + 1]; +} + +void +f7 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i + 1]; +} + +void +f8 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i + 1]; +} + +void +f9 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[1].x[i]; +} + +void +f10 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i].x[i]; +} + +void +f11 (struct u *a, struct u *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i] + b[i].y; +} + +void +f12 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f13 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f14 (struct v *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->s.x[i] = b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +#define N 16 + +struct s1 { int a[N]; }; +struct s2 { struct s1 b; int c; }; +struct s3 { int d; struct s1 e; }; +union u { struct s2 f; struct s3 g; }; + +/* We allow a and b to overlap arbitrarily. */ + +void +f1 (int a[][N], int b[][N]) +{ + for (int i = 0; i < N; ++i) + a[0][i] += b[0][i]; +} + +void +f2 (union u *a, union u *b) +{ + for (int i = 0; i < N; ++i) + a->f.b.a[i] += b->g.e.a[i]; +} + +void +f3 (struct s1 *a, struct s1 *b) +{ + for (int i = 0; i < N - 1; ++i) + a->a[i + 1] += b->a[i]; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N]; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */