Re: [PATCH v2] vect: fix vectorization of non-gather elementwise loads [PR124037]

Richard Biener Wed, 18 Feb 2026 06:08:29 -0800

On Wed, 18 Feb 2026, Victor Do Nascimento wrote:

> New in V2:
> ----------
>   - The approach of the patch is largely the same as in v1, with the
>   key functional distinction that `alignment_support_scheme' had been
>   re-categorized as `dr_unaligned_unsupported'. This is indicative of
>   memory accesses which are known to be unaligned at compile time, for
>   an architecture where vectorized unaligned accesses are unsupported.
>   For accesses whose alignment may not be known but which are required
>   at runtime, the correct category is `dr_aligned'.
> 
>   - Finally, a simplification is made.  Before, we had the check:
> 
>   if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
>        || (!vf.is_constant () && pow2p_hwi (align_factor_c)))
> 
>   Given that we have `new_alignment = vf * align_factor_c', I do not
>   believe anything can be derived from `new_alignment' that is now known
>   already in `align_factor_c '.  That is, a power of 2 has only one
>   prime factor: 2.  If `align_factor_c' is not itself a power of 2,
>   the multiplication by `vf' cannot correct that, such that the check
>   for `new_alignment' seems redundant.  The predicate can thus
>   simplified to `if (pow2p_hwi (align_factor_c))'.  The seems sound to
>   me, should my logic be correct.
> -------------------
> For the vectorization of non-contiguous memory accesses such as the
> vectorization of loads from a particular struct member, specifically
> when vectorizing with unknown bounds (thus using a pointer and not an
> array) it is observed that inadequate alignment checking allows for
> the crossing of a page boundary within a single vectorized loop
> iteration. This leads to potential segmentation faults in the
> resulting binaries.
> 
> For example, for the given datatype:
> 
>     typedef struct {
>       uint64_t a;
>       uint64_t b;
>       uint32_t flag;
>       uint32_t pad;
>     } Data;
> 
> and a loop such as:
> 
> int
> foo (Data *ptr) {
>   if (ptr == NULL)
>     return -1;
> 
>   int cnt;
>   for (cnt = 0; cnt < MAX; cnt++) {
>     if (ptr->flag == 0)
>       break;
>     ptr++;
>   }
>   return cnt;
> }
> 
> the vectorizer yields the following cfg on armhf:
> 
> <bb 1>:
> _41 = ptr_4(D) + 16;
> <bb 2>:
> _44 = MEM[(unsigned int *)ivtmp_42];
> ivtmp_45 = ivtmp_42 + 24;
> _46 = MEM[(unsigned int *)ivtmp_45];
> ivtmp_47 = ivtmp_45 + 24;
> _48 = MEM[(unsigned int *)ivtmp_47];
> ivtmp_49 = ivtmp_47 + 24;
> _50 = MEM[(unsigned int *)ivtmp_49];
> vect_cst__51 = {_44, _46, _48, _50};
> mask_patt_6.17_52 = vect_cst__51 == { 0, 0, 0, 0 };
> if (mask_patt_6.17_52 != { 0, 0, 0, 0 })
>   goto <bb 4>;
> else
>   ivtmp_43 = ivtmp_42 + 96;
>   goto <bb 2>;
> <bb4>
> ...
> 
> without any proper address alignment checks on the starting address
> or on whether alignment is preserved across iterations. We therefore
> fix the handling of such cases.
> 
> In `vect_compute_data_ref_alignment' we ensure that the vector
> alignment requirements are sufficiently large to encompass the entire
> address range covered by the group of element-wise loads carried out
> in the single vectorized iteration, rounded up to the nearest power of
> 2.  Finally, in `get_load_store_type', we ensure correct
> categorization of the alignment support scheme for VMAT_ELEMENTWISE
> access types, depending on whether or not they happen within the
> context of a scalarized vector access in a loop with unknown bounds
> where speculative reads are required.  Where this is found to be the
> case, we modify `alignment_support_scheme' from
> `dr_unaligned_supported' to `dr_aligned', ensuring
> proper alignment is required.
> 
> gcc/ChangeLog:
> 
>       PR tree-optimization/124037
>       * tree-vect-stmts.cc (get_load_store_type): Fix
>       alignment_support_scheme categorization of VMAT_ELEMENTWISE
>       loads.
>       * tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
>       Round alignments to next power of 2.
> 
> gcc/testsuite/ChangeLog:
> 
>       * gcc.dg/vect/vect-pr124037.c: New.
> 
> g++/testsuite/ChangeLog:
> 
>       * g++.dg/vect/vect-pr124037.c: New.
> ---
>  gcc/testsuite/g++.dg/vect/vect-pr124037.cc | 38 ++++++++++++++
>  gcc/testsuite/gcc.dg/vect/vect-pr124037.c  | 58 ++++++++++++++++++++++
>  gcc/tree-vect-data-refs.cc                 | 19 ++++---
>  gcc/tree-vect-stmts.cc                     | 14 +++++-
>  4 files changed, 117 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/vect/vect-pr124037.cc
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-pr124037.c
> 
> diff --git a/gcc/testsuite/g++.dg/vect/vect-pr124037.cc 
> b/gcc/testsuite/g++.dg/vect/vect-pr124037.cc
> new file mode 100644
> index 00000000000..6e46238c5de
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/vect/vect-pr124037.cc
> @@ -0,0 +1,38 @@
> +/* PR tree-optimization/124037 */
> +/* { dg-require-effective-target mmap } */
> +/* { dg-require-effective-target vect_early_break } */
> +/* { dg-additional-options "-std=c++11" } */
> +
> +struct Token
> +{
> +  unsigned t, t1, t2;
> +  void *a, *b;
> +  unsigned short Kind;
> +  unsigned short flags;
> +  constexpr Token() : Token(0){}
> +  constexpr Token(int a): t(0), t1(0), t2 (0), a(nullptr),b (nullptr),
> +                       Kind(a), flags(0) {}
> +  bool isNot(int K) const { return Kind != K; }
> +};
> +
> +[[gnu::noipa]]
> +unsigned getArgLength(const Token *ArgPtr) {
> +  unsigned NumArgTokens = 0;
> +  for (; ArgPtr->isNot(0); ++ArgPtr)
> +    ++NumArgTokens;
> +  return NumArgTokens;
> +}
> +
> +Token tokens[] = 
> {{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},
> +               {1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},
> +               {1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},
> +               {1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},
> +               {1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},{1},
> +               {1},{1},{1},{1},{1},{1},{}};
> +
> +int main()
> +{
> +  __builtin_printf("%d\n", getArgLength(tokens));
> +}
> +/* { dg-final { scan-tree-dump "alignment increased due to early break" 
> "vect" } } */
> +/* { dg-final { scan-tree-dump "step doesn't divide the vector alignment" 
> "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-pr124037.c 
> b/gcc/testsuite/gcc.dg/vect/vect-pr124037.c
> new file mode 100644
> index 00000000000..c6326bda5bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-pr124037.c
> @@ -0,0 +1,58 @@
> +/* PR tree-optimization/124037 */
> +/* { dg-require-effective-target mmap } */
> +/* { dg-require-effective-target vect_early_break } */
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include <unistd.h>
> +
> +#define MAX 65536
> +
> +typedef struct {
> +  uint64_t a;
> +  uint64_t b;
> +  uint32_t flag;
> +  uint32_t pad;
> +} Data;
> +
> +int __attribute__ ((noinline))
> +foo (Data *ptr) {
> +  if (ptr == NULL)
> +    return -1;
> +
> +  int cnt;
> +  for (cnt = 0; cnt < MAX; cnt++) {
> +    if (ptr->flag == 0)
> +      break;
> +    ptr++;
> +  }
> +  return cnt;
> +}
> +
> +int main() {
> +  long pgsz = sysconf (_SC_PAGESIZE);
> +  if (pgsz == -1) {
> +    fprintf (stderr, "sysconf failed\n");
> +    return 0;
> +  }
> +
> +  /* Allocate 2 consecutive pages.  */
> +  void *mem = mmap (NULL, pgsz * 2, PROT_READ | PROT_WRITE,
> +                 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> +  if (mem == MAP_FAILED) {
> +    fprintf (stderr, "mmap failed\n");
> +    return 0;
> +  }
> +
> +  memset (mem, 1, pgsz);
> +  uint8_t *ptr = (uint8_t*) mem + pgsz;
> +  memset (ptr - 16, 0, 16);
> +
> +  mprotect (ptr, pgsz, PROT_NONE);
> +  Data *data = (Data *) (ptr - 80);
> +  __builtin_printf("%d\n", foo(data));
> +}
> +
> +/* { dg-final { scan-tree-dump "alignment increased due to early break" 
> "vect" } } */
> +/* { dg-final { scan-tree-dump "step doesn't divide the vector alignment" 
> "vect" } } */
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index e1facc7e957..c6f77bdc4b1 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -1457,21 +1457,20 @@ vect_compute_data_ref_alignment (vec_info *vinfo, 
> dr_vec_info *dr_info,
>        if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
>       align_factor_c *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info));
>  
> +      if (!pow2p_hwi (align_factor_c))
> +       align_factor_c = HOST_WIDE_INT_1U << ceil_log2 (align_factor_c);
> +


But when the size of the group is not power-of-two no power-of-two
alignment guarantees that we never cross a page boundary during
iteration?

>        poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>        poly_uint64 new_alignment = vf * align_factor_c;
>  
> -      if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
> -       || (!vf.is_constant () && pow2p_hwi (align_factor_c)))

The VF need not be power-of-two, so I think you'd need both.  I think
the theory for !vf.is_constnat () was that a non-constant VF is
always power-of-two(?!)

> +      if (dump_enabled_p ())
>       {
> -       if (dump_enabled_p ())
> -         {
> -           dump_printf_loc (MSG_NOTE, vect_location,
> -                            "alignment increased due to early break to ");
> -           dump_dec (MSG_NOTE, new_alignment);
> -           dump_printf (MSG_NOTE, " bytes.\n");
> -         }
> -       vector_alignment = new_alignment;
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "alignment increased due to early break to ");
> +       dump_dec (MSG_NOTE, new_alignment);
> +       dump_printf (MSG_NOTE, " bytes.\n");
>       }
> +      vector_alignment = new_alignment;

I suppose for the case where no alignment is safe we'd have to fail
here or later.  Is that artificial power-of-two alignment then
correctly determined as not maintained later?

>      }
>  
>    SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> index ee98e72d1e5..473a77bbc08 100644
> --- a/gcc/tree-vect-stmts.cc
> +++ b/gcc/tree-vect-stmts.cc
> @@ -2502,8 +2502,18 @@ get_load_store_type (vec_info  *vinfo, stmt_vec_info 
> stmt_info,
>        || *memory_access_type == VMAT_CONTIGUOUS_REVERSE)
>      *poffset = neg_ldst_offset;
>  
> -  if (*memory_access_type == VMAT_ELEMENTWISE
> -      || *memory_access_type == VMAT_GATHER_SCATTER_LEGACY
> +  if (*memory_access_type == VMAT_ELEMENTWISE)
> +    {
> +      if (dr_safe_speculative_read_required (stmt_info)
> +       && !STMT_VINFO_GATHER_SCATTER_P (stmt_info)
> +       && !DR_SCALAR_KNOWN_BOUNDS (STMT_VINFO_DR_INFO (stmt_info)))
> +     *alignment_support_scheme = dr_aligned;
> +      else
> +     *alignment_support_scheme = dr_unaligned_supported;
> +      *misalignment = DR_MISALIGNMENT_UNKNOWN;
> +    }
> +
> +  else if (*memory_access_type == VMAT_GATHER_SCATTER_LEGACY
>        || *memory_access_type == VMAT_STRIDED_SLP
>        || *memory_access_type == VMAT_INVARIANT)
>      {


I think it would be clearer to have

    if (dr_safe_speculative_read_required (stmt_info))
      *alignment_support_scheme = dr_aligned;
    else if (...

I'm unsure about *misalignment, I'd say the existing else path
computes it correctly.  Robin last refactored all this IIRC.

Richard.

> 

-- 
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH v2] vect: fix vectorization of non-gather elementwise loads [PR124037]

Reply via email to