[PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

Pengfei Li Fri, 06 Jun 2025 07:38:49 -0700

Current GCC uses either peeling or versioning, but not in combination,
to handle unaligned data references (DRs) during vectorization. This
limitation causes some loops with early break to fall back to scalar
code at runtime.


Consider the following loop with DRs in its early break condition:

        for (int i = start; i < end; i++) {
          if (a[i] == b[i])
            break;
          count++;
        }

In the loop, references to a[] and b[] need to be strictly aligned for
vectorization because speculative reads that may cross page boundaries
are not allowed. Current GCC does versioning for this loop by creating a
runtime check like:

        ((&a[start] | &b[start]) & mask) == 0

to see if two initial addresses both have lower bits zeros. If above
runtime check fails, the loop will fall back to scalar code. However,
it's often possible that DRs are all unaligned at the beginning but they
become all aligned after a few loop iterations. We call this situation
DRs being "mutually aligned".

This patch enables combined peeling and versioning to avoid loops with
mutually aligned DRs falling back to scalar code. Specifically, the
function vect_peeling_supportable is updated in this patch to return a
three-state enum indicating how peeling can make all unsupportable DRs
aligned. In addition to previous true/false return values, a new state
peeling_maybe_supported is used to indicate that peeling may be able to
make these DRs aligned but we are not sure about it at compile time. In
this case, peeling should be combined with versioning so that a runtime
check will be generated to guard the peeled vectorized loop.

A new type of runtime check is also introduced for combined peeling and
versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true.
The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS
have the same lower address bits. For above loop case, the new test will
generate an XOR between two addresses, like:

        ((&a[start] ^ &b[start]) & mask) == 0

Therefore, if a and b have the same alignment step (element size) and
the same offset from an alignment boundary, a peeled vectorized loop
will run. This new runtime check also works for >2 DRs, with the LHS
expression being:

        ((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask

where ai is the address of i'th DR.

This patch is bootstrapped and regression tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu.

gcc/ChangeLog:

        * tree-vect-data-refs.cc (vect_peeling_supportable): Return new
        enum values to indicate if combined peeling and versioning can
        potentially support vectorization.
        (vect_enhance_data_refs_alignment): Support combined peeling and
        versioning in vectorization analysis.
        * tree-vect-loop-manip.cc (vect_create_cond_for_align_checks):
        Add a new type of runtime check for mutually aligned DRs.
        * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set
        default value of allow_mutual_alignment in the initializer list.
        * tree-vectorizer.h (enum peeling_support): Define type of
        peeling support for function vect_peeling_supportable.
        (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.
---
 .../gcc.dg/vect/vect-early-break_133_pfa6.c   |   2 +-
 gcc/tree-vect-data-refs.cc                    | 168 ++++++++++++++----
 gcc/tree-vect-loop-manip.cc                   |  96 +++++++---
 gcc/tree-vect-loop.cc                         |   1 +
 gcc/tree-vectorizer.h                         |  16 ++
 5 files changed, 222 insertions(+), 61 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c 
b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
index ee123df6ed2..7787d037d9d 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_133_pfa6.c
@@ -20,4 +20,4 @@ unsigned test4(char x, char *vect_a, char *vect_b, int n)
  return ret;
 }
 
-/* { dg-final { scan-tree-dump "Versioning for alignment will be applied" 
"vect" } } */
+/* { dg-final { scan-tree-dump "Both peeling and versioning will be applied" 
"vect" } } */
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 1792ee4ea05..befdbff29f3 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -2111,9 +2111,10 @@ vect_peeling_hash_choose_best_peeling 
(hash_table<peel_info_hasher> *peeling_hta
    return res;
 }
 
-/* Return true if the new peeling NPEEL is supported.  */
+/* Return if vectorization is definitely, possibly, or unlikely to be
+   supportable after loop peeling.  */
 
-static bool
+static enum peeling_support
 vect_peeling_supportable (loop_vec_info loop_vinfo, dr_vec_info *dr0_info,
                          unsigned npeel)
 {
@@ -2123,8 +2124,11 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
dr_vec_info *dr0_info,
   bool dr0_alignment_known_p
     = known_alignment_for_access_p (dr0_info,
                                    STMT_VINFO_VECTYPE (dr0_info->stmt));
+  bool has_unsupported_dr_p = false;
+  unsigned int dr0_step = DR_STEP_ALIGNMENT (dr0_info->dr);
+  int known_unsupported_misalignment = DR_MISALIGNMENT_UNKNOWN;
 
-  /* Ensure that all data refs can be vectorized after the peel.  */
+  /* Check if each data ref can be vectorized after peeling.  */
   for (data_reference *dr : datarefs)
     {
       if (dr == dr0_info->dr)
@@ -2152,10 +2156,44 @@ vect_peeling_supportable (loop_vec_info loop_vinfo, 
dr_vec_info *dr0_info,
        = vect_supportable_dr_alignment (loop_vinfo, dr_info, vectype,
                                         misalignment);
       if (supportable_dr_alignment == dr_unaligned_unsupported)
-       return false;
+       {
+         has_unsupported_dr_p = true;
+
+         /* If unaligned unsupported DRs exist, we do following checks to see
+            if they can be mutually aligned to support vectorization.  If yes,
+            we can try peeling and create a runtime (mutual alignment) check
+            to guard the peeled loop.  If no, return PEELING_UNSUPPORTED.  */
+
+         /* 1) If unaligned unsupported DRs have different alignment steps, the
+               probability of DRs being mutually aligned is very low, and it's
+               quite complex to check mutual alignment at runtime.  We return
+               PEELING_UNSUPPORTED in this case.  */
+         if (DR_STEP_ALIGNMENT (dr) != dr0_step)
+           return peeling_unsupported;
+
+         /* 2) Based on above same alignment step condition, if one known
+               misaligned DR has zero misalignment, or different misalignment
+               amount from another known misaligned DR, peeling is unable to
+               help make all these DRs aligned together.  We won't try peeling
+               with versioning anymore.  */
+         int curr_dr_misalignment = dr_misalignment (dr_info, vectype);
+         if (curr_dr_misalignment == 0)
+           return peeling_unsupported;
+         if (known_unsupported_misalignment != DR_MISALIGNMENT_UNKNOWN)
+           {
+             if (curr_dr_misalignment != DR_MISALIGNMENT_UNKNOWN
+                 && curr_dr_misalignment != known_unsupported_misalignment)
+               return peeling_unsupported;
+           }
+         else
+           known_unsupported_misalignment = curr_dr_misalignment;
+       }
     }
 
-  return true;
+  /* Vectorization is known to be supportable with peeling alone when there is
+     no unsupported DR.  */
+  return has_unsupported_dr_p ? peeling_maybe_supported
+                             : peeling_known_supported;
 }
 
 /* Compare two data-references DRA and DRB to group them into chunks
@@ -2264,20 +2302,20 @@ dr_align_group_sort_cmp (const void *dra_, const void 
*drb_)
      }
 
      -- Possibility 3: combination of loop peeling and versioning:
-     for (i = 0; i < 3; i++){  # (scalar loop, not to be vectorized).
-       x = q[i];
-       p[i] = y;
-     }
-     if (p is aligned) {
-       for (i = 3; i<N; i++){  # loop 3A
+     if (p & q are mutually aligned) {
+       for (i=0; i<3; i++){    # (peeled loop iterations).
+         x = q[i];
+         p[i] = y;
+       }
+       for (i=3; i<N; i++){    # loop 3A
          x = q[i];                     # DR_MISALIGNMENT(q) = 0
          p[i] = y;                     # DR_MISALIGNMENT(p) = 0
        }
      }
      else {
-       for (i = 3; i<N; i++){  # loop 3B
-         x = q[i];                     # DR_MISALIGNMENT(q) = 0
-         p[i] = y;                     # DR_MISALIGNMENT(p) = unaligned
+       for (i=0; i<N; i++){    # (scalar loop, not to be vectorized).
+         x = q[i];                     # DR_MISALIGNMENT(q) = 3
+         p[i] = y;                     # DR_MISALIGNMENT(p) = unknown
        }
      }
 
@@ -2296,6 +2334,7 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   unsigned int i;
   bool do_peeling = false;
   bool do_versioning = false;
+  bool try_peeling_with_versioning = false;
   unsigned int npeel = 0;
   bool one_misalignment_known = false;
   bool one_misalignment_unknown = false;
@@ -2361,30 +2400,38 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   /* While cost model enhancements are expected in the future, the high level
      view of the code at this time is as follows:
 
-     A) If there is a misaligned access then see if peeling to align
-        this access can make all data references satisfy
-        vect_supportable_dr_alignment.  If so, update data structures
-        as needed and return true.
+     A) If there is a misaligned access then see if doing peeling alone can
+       make all data references satisfy vect_supportable_dr_alignment.  If so,
+       update data structures and return.
+
+     B) If peeling alone wasn't possible and there is a data reference with an
+       unknown misalignment that does not satisfy vect_supportable_dr_alignment
+       then we may use either of the following two approaches.
 
-     B) If peeling wasn't possible and there is a data reference with an
-        unknown misalignment that does not satisfy 
vect_supportable_dr_alignment
-        then see if loop versioning checks can be used to make all data
-        references satisfy vect_supportable_dr_alignment.  If so, update
-        data structures as needed and return true.
+       B1) Try peeling with versioning: Add a runtime loop versioning check to
+           see if all unsupportable data references are mutually aligned, which
+           means they will be uniformly aligned after a certain amount of loop
+           peeling.  If peeling and versioning can be used together, set
+           LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT_P to TRUE and return.
 
-     C) If neither peeling nor versioning were successful then return false if
-        any data reference does not satisfy vect_supportable_dr_alignment.
+       B2) Try versioning alone: Add a runtime loop versioning check to see if
+           all unsupportable data references are already uniformly aligned
+           without loop peeling.  If versioning can be applied alone, set
+           LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT_P to FALSE and return.
 
-     D) Return true (all data references satisfy 
vect_supportable_dr_alignment).
+       Above B1 is more powerful and more likely to be adopted than B2.  But B2
+       is still available and useful in some cases, for example, the cost model
+       does not allow much peeling.
 
-     Note, Possibility 3 above (which is peeling and versioning together) is 
not
-     being done at this time.  */
+     C) If none of above was successful then the alignment was not enhanced,
+       just return.  */
 
   /* (1) Peeling to force alignment.  */
 
-  /* (1.1) Decide whether to perform peeling, and how many iterations to peel:
+  /* (1.1) Decide whether to perform peeling, how many iterations to peel, and
+     if vectorization may be supported by peeling with versioning.
      Considerations:
-     + How many accesses will become aligned due to the peeling
+     - How many accesses will become aligned due to the peeling
      - How many accesses will become unaligned due to the peeling,
        and the cost of misaligned accesses.
      - The cost of peeling (the extra runtime checks, the increase
@@ -2732,9 +2779,27 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
                              "Try peeling by %d\n", npeel);
         }
 
-      /* Ensure that all datarefs can be vectorized after the peel.  */
-      if (!vect_peeling_supportable (loop_vinfo, dr0_info, npeel))
-       do_peeling = false;
+      /* Check how peeling for alignment can support vectorization.  Function
+        vect_peeling_supportable returns one of the three possible values:
+        - PEELING_KNOWN_SUPPORTED: indicates that we know all unsupported
+          datarefs can be aligned after peeling.  We can use peeling alone.
+        - PEELING_MAYBE_SUPPORTED: indicates that peeling may be able to make
+          these datarefs aligned but we are not sure about it at compile time.
+          We will try peeling with versioning to add a runtime check to guard
+          the peeled loop.
+        - PEELING_UNSUPPORTED: indicates that peeling is almost impossible to
+          support vectorization.  We will stop trying peeling.  */
+      switch (vect_peeling_supportable (loop_vinfo, dr0_info, npeel))
+       {
+       case peeling_known_supported:
+         break;
+       case peeling_maybe_supported:
+         try_peeling_with_versioning = true;
+         break;
+       case peeling_unsupported:
+         do_peeling = false;
+         break;
+       }
 
       /* Check if all datarefs are supportable and log.  */
       if (do_peeling
@@ -2811,7 +2876,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
 
                vect_update_misalignment_for_peel (dr_info, dr0_info, npeel);
              }
+       }
 
+      if (do_peeling && !try_peeling_with_versioning)
+       {
+         /* Update data structures if peeling will be applied alone.  */
           LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
           if (npeel)
             LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = npeel;
@@ -2939,6 +3008,11 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
         LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo).truncate (0);
     }
 
+  /* If we are trying peeling with versioning but versioning is disabled for
+     some reason, peeling should be turned off together.  */
+  if (try_peeling_with_versioning && !do_versioning)
+    do_peeling = false;
+
   if (do_versioning)
     {
       const vec<stmt_vec_info> &may_misalign_stmts
@@ -2958,12 +3032,28 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
                              "Alignment of access forced using versioning.\n");
         }
 
-      if (dump_enabled_p ())
-        dump_printf_loc (MSG_NOTE, vect_location,
-                         "Versioning for alignment will be applied.\n");
-
-      /* Peeling and versioning can't be done together at this time.  */
-      gcc_assert (! (do_peeling && do_versioning));
+      if (do_peeling)
+       {
+         /* This point is reached if peeling and versioning are used together
+            to ensure alignment.  Update data structures to make sure the loop
+            is correctly peeled and a right runtime check is added for loop
+            versioning.  */
+         gcc_assert (try_peeling_with_versioning);
+         LOOP_VINFO_UNALIGNED_DR (loop_vinfo) = dr0_info;
+         LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = -1;
+         LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo) = true;
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "Both peeling and versioning will be applied.\n");
+       }
+      else
+       {
+         /* This point is reached if versioning is used alone.  */
+         LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo) = false;
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_NOTE, vect_location,
+                            "Versioning for alignment will be applied.\n");
+       }
 
       return opt_result::success ();
     }
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 56a4e9a8b63..18f7c1bcea6 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -3787,10 +3787,11 @@ chain_cond_expr (tree *cond_expr, tree part_cond_expr)
 
    Input:
    COND_EXPR  - input conditional expression.  New conditions will be chained
-                with logical AND operation.
-   LOOP_VINFO - two fields of the loop information are used.
-                LOOP_VINFO_PTR_MASK is the mask used to check the alignment.
-                LOOP_VINFO_MAY_MISALIGN_STMTS contains the refs to be checked.
+               with logical AND operation.
+   LOOP_VINFO - three fields of the loop information are used.
+               LOOP_VINFO_PTR_MASK is the mask used to check the alignment.
+               LOOP_VINFO_MAY_MISALIGN_STMTS contains the refs to be checked.
+               LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT indicates which check applies.
 
    Output:
    COND_EXPR_STMT_LIST - statements needed to construct the conditional
@@ -3798,7 +3799,20 @@ chain_cond_expr (tree *cond_expr, tree part_cond_expr)
    The returned value is the conditional expression to be used in the if
    statement that controls which version of the loop gets executed at runtime.
 
-   The algorithm makes two assumptions:
+   Based on the boolean value of LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT, we decide
+   which type of check should be applied and create two different expressions
+   accordingly.
+     1) When LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is false, we see if all data 
refs
+       to be checked are already aligned to an alignment boundary.  We create
+       an expression of "(a_1 | a_2 | a_3 | ... | a_n) & mask", where "a_i" is
+       the address of i'th data reference.
+     2) When LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true, we see if all data refs
+       can be aligned to a boundary after a certain amount of peeling, in other
+       words, their addresses have the same bottom bits according to the mask.
+       We create "((a_1 ^ a_2) | (a_2 ^ a_3) | ... | (a_n-1 ^ a_n)) & mask",
+       where "a_i" is the address of i'th data reference.
+
+   Both algorithms make two assumptions:
      1) The number of bytes "n" in a vector is a power of 2.
      2) An address "a" is aligned if a%n is zero and that this
         test can be done as a&(n-1) == 0.  For example, for 16
@@ -3818,6 +3832,7 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
   tree int_ptrsize_type;
   char tmp_name[20];
   tree or_tmp_name = NULL_TREE;
+  tree prev_addr_tmp_name = NULL_TREE;
   tree and_tmp_name;
   gimple *and_stmt;
   tree ptrsize_zero;
@@ -3829,16 +3844,19 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
 
   int_ptrsize_type = signed_type_for (ptr_type_node);
 
-  /* Create expression (mask & (dr_1 || ... || dr_n)) where dr_i is the address
-     of the first vector of the i'th data reference. */
+  /* If LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true, we should have at least two
+     datarefs to check the mutual alignment.  */
+  gcc_assert (may_misalign_stmts.length () > 1
+             || !LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo));
 
   FOR_EACH_VEC_ELT (may_misalign_stmts, i, stmt_info)
     {
       gimple_seq new_stmt_list = NULL;
       tree addr_base;
       tree addr_tmp_name;
+      tree xor_tmp_name;
       tree new_or_tmp_name;
-      gimple *addr_stmt, *or_stmt;
+      gimple *addr_stmt, *or_stmt, *xor_stmt;
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
       bool negative = tree_int_cst_compare
        (DR_STEP (STMT_VINFO_DATA_REF (stmt_info)), size_zero_node) < 0;
@@ -3860,20 +3878,56 @@ vect_create_cond_for_align_checks (loop_vec_info 
loop_vinfo,
       addr_stmt = gimple_build_assign (addr_tmp_name, NOP_EXPR, addr_base);
       gimple_seq_add_stmt (cond_expr_stmt_list, addr_stmt);
 
-      /* The addresses are OR together.  */
-
-      if (or_tmp_name != NULL_TREE)
-        {
-          /* create: or_tmp = or_tmp | addr_tmp */
-          sprintf (tmp_name, "orptrs%d", i);
-         new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL, 
tmp_name);
-         or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
-                                        or_tmp_name, addr_tmp_name);
-         gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
-          or_tmp_name = new_or_tmp_name;
-        }
+      if (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT (loop_vinfo))
+       {
+         /* Create "((a_1 ^ a_2) | (a_2 ^ a_3) | ... | (a_n-1 ^ a_n)) & mask"
+            to check mutual alignment.  */
+         if (prev_addr_tmp_name != NULL_TREE)
+           {
+             sprintf (tmp_name, "xorptrs%d_%d", i - 1, i);
+             xor_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
+                                                tmp_name);
+             xor_stmt = gimple_build_assign (xor_tmp_name, BIT_XOR_EXPR,
+                                             prev_addr_tmp_name,
+                                             addr_tmp_name);
+             gimple_seq_add_stmt (cond_expr_stmt_list, xor_stmt);
+             if (or_tmp_name == NULL_TREE)
+               {
+                 /* Create the 1st XOR when the 2nd data ref is seen.  */
+                 or_tmp_name = xor_tmp_name;
+               }
+             else
+               {
+                 /* Create: or_tmp = or_tmp | new_xor_tmp.  */
+                 sprintf (tmp_name, "orxors%d", i - 1);
+                 new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
+                                                       tmp_name);
+                 or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
+                                                or_tmp_name, xor_tmp_name);
+                 gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
+                 or_tmp_name = new_or_tmp_name;
+               }
+           }
+         prev_addr_tmp_name = addr_tmp_name;
+       }
       else
-        or_tmp_name = addr_tmp_name;
+       {
+         /* Create: "(a_1 | a_2 | a_3 | ... | a_n) & mask" to check if all
+            addresses are already aligned.  */
+         if (or_tmp_name != NULL_TREE)
+           {
+             /* Create: or_tmp = or_tmp | addr_tmp.  */
+             sprintf (tmp_name, "orptrs%d", i);
+             new_or_tmp_name = make_temp_ssa_name (int_ptrsize_type, NULL,
+                                                   tmp_name);
+             or_stmt = gimple_build_assign (new_or_tmp_name, BIT_IOR_EXPR,
+                                            or_tmp_name, addr_tmp_name);
+             gimple_seq_add_stmt (cond_expr_stmt_list, or_stmt);
+             or_tmp_name = new_or_tmp_name;
+           }
+         else
+           or_tmp_name = addr_tmp_name;
+       }
 
     } /* end for i */
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 8c5761d3c55..76c17e75f2e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1069,6 +1069,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
     using_decrementing_iv_p (false),
     using_select_vl_p (false),
     epil_using_partial_vectors_p (false),
+    allow_mutual_alignment (false),
     partial_load_store_bias (0),
     peeling_for_gaps (false),
     peeling_for_niter (false),
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 7aa2b02b63c..6ccafaf18e6 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -56,6 +56,14 @@ enum dr_alignment_support {
   dr_aligned
 };
 
+/* Define type of peeling support to indicate how peeling for alignment can 
help
+   make vectorization supported.  */
+enum peeling_support {
+  peeling_known_supported,
+  peeling_maybe_supported,
+  peeling_unsupported
+};
+
 /* Define type of def-use cross-iteration cycle.  */
 enum vect_def_type {
   vect_uninitialized_def = 0,
@@ -946,6 +954,13 @@ public:
      epilogue of loop.  */
   bool epil_using_partial_vectors_p;
 
+  /* True if we've decided to use peeling with versioning together, which 
allows
+     unaligned unsupported data refs to be uniformly aligned after a certain
+     amount of peeling (mutual alignment).  Otherwise, we use versioning alone
+     so these data refs must be already aligned to a power-of-two boundary
+     without peeling.  */
+  bool allow_mutual_alignment;
+
   /* The bias for len_load and len_store.  For now, only 0 and -1 are
      supported.  -1 must be used when a backend does not support
      len_load/len_store with a length of zero.  */
@@ -1070,6 +1085,7 @@ public:
 #define LOOP_VINFO_USING_SELECT_VL_P(L) (L)->using_select_vl_p
 #define LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P(L)                             
\
   (L)->epil_using_partial_vectors_p
+#define LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT(L) (L)->allow_mutual_alignment
 #define LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS(L) (L)->partial_load_store_bias
 #define LOOP_VINFO_VECT_FACTOR(L)          (L)->vectorization_factor
 #define LOOP_VINFO_MAX_VECT_FACTOR(L)      (L)->max_vectorization_factor
-- 
2.43.0

[PATCH] vect: Use combined peeling and versioning for mutually aligned DRs

Reply via email to