Re: [PATCH] vect: disable vectorization of non-gather elementwise loads [PR124037]

Victor Do Nascimento Tue, 10 Feb 2026 09:07:37 -0800



On 2/10/26 10:55, Tamar Christina wrote:

-----Original Message-----
From: Richard Biener <[email protected]>
Sent: 10 February 2026 10:33
To: Victor Do Nascimento <[email protected]>
Cc: [email protected]; Tamar Christina <[email protected]>
Subject: Re: [PATCH] vect: disable vectorization of non-gather elementwise
loads [PR124037]

On Tue, 10 Feb 2026, Victor Do Nascimento wrote:

For the vectorization of non-contiguous memory accesses such as the
vectorization of loads from a particular struct member, specifically
when vectorizing with unknown bounds (thus using a pointer and not an
array) it is observed that inadequate alignment checking allows for
the crossing of a page boundary within a single vectorized loop
iteration. This leads to potential segmentation faults in the
resulting binaries.

For example, for the given datatype:

     typedef struct {
       uint64_t a;
       uint64_t b;
       uint32_t flag;
       uint32_t pad;
     } Data;

and a loop such as:

int
foo (Data *ptr) {
   if (ptr == NULL)
     return -1;

   int cnt;
   for (cnt = 0; cnt < MAX; cnt++) {
     if (ptr->flag == 0)
       break;
     ptr++;
   }
   return cnt;
}

the vectorizer yields the following loop logic on armhf:

<bb 1>:
_41 = ptr_4(D) + 16;
<bb 2>:
_44 = MEM[(unsigned int *)ivtmp_42];
ivtmp_45 = ivtmp_42 + 24;
_46 = MEM[(unsigned int *)ivtmp_45];
ivtmp_47 = ivtmp_45 + 24;
_48 = MEM[(unsigned int *)ivtmp_47];
ivtmp_49 = ivtmp_47 + 24;
_50 = MEM[(unsigned int *)ivtmp_49];
vect_cst__51 = {_44, _46, _48, _50};
mask_patt_6.17_52 = vect_cst__51 == { 0, 0, 0, 0 };
if (mask_patt_6.17_52 != { 0, 0, 0, 0 })
   goto <bb 4>;
else
   ivtmp_43 = ivtmp_42 + 96;
   goto <bb 2>;
<bb4>
...

without any proper address alignment checks on the starting address
or on whether alignment is preserved across iterations.

Reject the vectorization of such cases.


Why is peeling for alignment not applied?  This is not a
STMT_VINFO_STRIDED_P access and VMAT_ELEMENTWISE is only a fallback
for the access, it should not disable the peeling for alignment
requirement here?


My understanding from a quick look at this before was  that the requested
alignment ends up as not a power of two, which gets the vectorizer to correctly
say that the alignment is not reachable.  And when you over-align it the check
disappeared.
My guess is because of this target alignment is not set correctly, and the block

   /* If this DR needs alignment for correctness, we must ensure the target
      alignment is a constant power-of-two multiple of the amount read per
      vector iteration or force masking.  */
   if (dr_safe_speculative_read_required (stmt_info)
       && (*alignment_support_scheme == dr_aligned
          && !mat_gather_scatter_p (*memory_access_type)))
     {

Does not end up rejecting it.  If this would be true then I agree with you in 
that
the likely problem is that DR_TARGET_ALIGNMENT on the DR is incorrect and
is why we don't reject the access.


Yes, the DR_TARGET_ALIGNMENT is indeed wrong. We have:

  if (loop_vinfo
      && dr_safe_speculative_read_required (stmt_info))
    {
      ...
      // new_alignment = correct alignment for group of reads
      if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
          || (!vf.is_constant () && pow2p_hwi (align_factor_c)))
        {
          ...
            vector_alignment = new_alignment;
        }
    }
  SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);

so that if the alignment is not a power of 2 we don't update the alignment
accordingly.

My idea is that if either `pow2p_hwi' condition fails, we round alignment to
the next power of 2 w/ `ceil_log2 (new_alignment.to_constant ())'.

but perhaps equally important, in `get_load_store_type' we have:

  if (*memory_access_type == VMAT_ELEMENTWISE
      || *memory_access_type == VMAT_GATHER_SCATTER_LEGACY
      || *memory_access_type == VMAT_STRIDED_SLP
      || *memory_access_type == VMAT_INVARIANT)
    {
      *alignment_support_scheme = dr_unaligned_supported;
      *misalignment = DR_MISALIGNMENT_UNKNOWN;
    }

So I think that at the moment, irrespective of the value given to
DR_TARGET_ALIGNMENT, we never actually bother checking alignment as it's
assumed that unaligned is okay. In the context in question I suppose we
need `*alignment_support_scheme = dr_unaligned_unsupported' here.

I have a patch that fixes these two issues (assuming both are necessary and

sufficient fixes), I'm checking whether all regression tests pass atpresent.

The original patch was intended as a minimal check for correctness thatcouldbe backported to gcc15, rejecting flawed cases, with the bigger patchwhich is

currently under test going into gcc16 allowing the potential peeling for
alignment to continue to apply.

Once I've got a working fix for the issues highlighted above, I'll postas a v2.


Any feedback here is most welcome.

Cheers,
V.


So I wonder if we override it somewhere?

Thanks,
Tamar


Richard.


gcc/ChangeLog:

        PR tree-optimization/124037
        * tree-vect-stmts.cc (get_load_store_type): Add check for use
        context of VMAT_ELEMENTWISE loads.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/pr123588.c: New.
---
  gcc/testsuite/gcc.dg/vect/pr124037.c | 57

++++++++++++++++++++++++++++

  gcc/tree-vect-stmts.cc               | 13 +++++++
  2 files changed, 70 insertions(+)
  create mode 100644 gcc/testsuite/gcc.dg/vect/pr124037.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr124037.c

b/gcc/testsuite/gcc.dg/vect/pr124037.c

new file mode 100644
index 00000000000..3121380454b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr124037.c
@@ -0,0 +1,57 @@
+/* PR tree-optimization/124037 */
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target mmap } */
+/* { dg-require-effective-target vect_early_break } */
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <unistd.h>
+
+#define MAX 65536
+
+typedef struct {
+  uint64_t a;
+  uint64_t b;
+  uint32_t flag;
+  uint32_t pad;
+} Data;
+
+int __attribute__ ((noinline))
+foo (Data *ptr) {
+  if (ptr == NULL)
+    return -1;
+
+  int cnt;
+  for (cnt = 0; cnt < MAX; cnt++) {
+    if (ptr->flag == 0)
+      break;
+    ptr++;
+  }
+  return cnt;
+}
+
+int main() {
+  long pgsz = sysconf (_SC_PAGESIZE);
+  if (pgsz == -1) {
+    fprintf (stderr, "sysconf failed\n");
+    return 0;
+  }
+
+  /* Allocate 2 consecutive pages.  */
+  void *mem = mmap (NULL, pgsz * 2, PROT_READ | PROT_WRITE,
+                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+  if (mem == MAP_FAILED) {
+    fprintf (stderr, "mmap failed\n");
+    return 0;
+  }
+
+  memset (mem, 1, pgsz);
+  uint8_t *ptr = (uint8_t*) mem + pgsz;
+  memset (ptr - 16, 0, 16);
+
+  mprotect (ptr, pgsz, PROT_NONE);
+  Data *data = (Data *) (ptr - 80);
+  __builtin_printf("%d\n", foo(data));
+}
+
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ee98e72d1e5..7434d934463 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2581,6 +2581,19 @@ get_load_store_type (vec_info  *vinfo,

stmt_vec_info stmt_info,

        return false;
      }

+  if (loop_vinfo
+      && !inbounds
+      && dr_safe_speculative_read_required (stmt_info)
+      && *memory_access_type == VMAT_ELEMENTWISE
+      && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+    {
+      if (dump_enabled_p ())
+       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                        "unsafe use of elementwise access for buffer with"
+                        "unknown bounds\n");
+      return false;
+    }
+
    /* If this DR needs alignment for correctness, we must ensure the target
       alignment is a constant power-of-two multiple of the amount read per
       vector iteration or force masking.  */


--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
Nuernberg)

Re: [PATCH] vect: disable vectorization of non-gather elementwise loads [PR124037]

Reply via email to