On 2/10/26 10:55, Tamar Christina wrote:
-----Original Message-----
From: Richard Biener <[email protected]>
Sent: 10 February 2026 10:33
To: Victor Do Nascimento <[email protected]>
Cc: [email protected]; Tamar Christina <[email protected]>
Subject: Re: [PATCH] vect: disable vectorization of non-gather elementwise
loads [PR124037]
On Tue, 10 Feb 2026, Victor Do Nascimento wrote:
For the vectorization of non-contiguous memory accesses such as the
vectorization of loads from a particular struct member, specifically
when vectorizing with unknown bounds (thus using a pointer and not an
array) it is observed that inadequate alignment checking allows for
the crossing of a page boundary within a single vectorized loop
iteration. This leads to potential segmentation faults in the
resulting binaries.
For example, for the given datatype:
typedef struct {
uint64_t a;
uint64_t b;
uint32_t flag;
uint32_t pad;
} Data;
and a loop such as:
int
foo (Data *ptr) {
if (ptr == NULL)
return -1;
int cnt;
for (cnt = 0; cnt < MAX; cnt++) {
if (ptr->flag == 0)
break;
ptr++;
}
return cnt;
}
the vectorizer yields the following loop logic on armhf:
<bb 1>:
_41 = ptr_4(D) + 16;
<bb 2>:
_44 = MEM[(unsigned int *)ivtmp_42];
ivtmp_45 = ivtmp_42 + 24;
_46 = MEM[(unsigned int *)ivtmp_45];
ivtmp_47 = ivtmp_45 + 24;
_48 = MEM[(unsigned int *)ivtmp_47];
ivtmp_49 = ivtmp_47 + 24;
_50 = MEM[(unsigned int *)ivtmp_49];
vect_cst__51 = {_44, _46, _48, _50};
mask_patt_6.17_52 = vect_cst__51 == { 0, 0, 0, 0 };
if (mask_patt_6.17_52 != { 0, 0, 0, 0 })
goto <bb 4>;
else
ivtmp_43 = ivtmp_42 + 96;
goto <bb 2>;
<bb4>
...
without any proper address alignment checks on the starting address
or on whether alignment is preserved across iterations.
Reject the vectorization of such cases.
Why is peeling for alignment not applied? This is not a
STMT_VINFO_STRIDED_P access and VMAT_ELEMENTWISE is only a fallback
for the access, it should not disable the peeling for alignment
requirement here?
My understanding from a quick look at this before was that the requested
alignment ends up as not a power of two, which gets the vectorizer to correctly
say that the alignment is not reachable. And when you over-align it the check
disappeared.
My guess is because of this target alignment is not set correctly, and the block
/* If this DR needs alignment for correctness, we must ensure the target
alignment is a constant power-of-two multiple of the amount read per
vector iteration or force masking. */
if (dr_safe_speculative_read_required (stmt_info)
&& (*alignment_support_scheme == dr_aligned
&& !mat_gather_scatter_p (*memory_access_type)))
{
Does not end up rejecting it. If this would be true then I agree with you in
that
the likely problem is that DR_TARGET_ALIGNMENT on the DR is incorrect and
is why we don't reject the access.
Yes, the DR_TARGET_ALIGNMENT is indeed wrong. We have:
if (loop_vinfo
&& dr_safe_speculative_read_required (stmt_info))
{
...
// new_alignment = correct alignment for group of reads
if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ()))
|| (!vf.is_constant () && pow2p_hwi (align_factor_c)))
{
...
vector_alignment = new_alignment;
}
}
SET_DR_TARGET_ALIGNMENT (dr_info, vector_alignment);
so that if the alignment is not a power of 2 we don't update the alignment
accordingly.
My idea is that if either `pow2p_hwi' condition fails, we round alignment to
the next power of 2 w/ `ceil_log2 (new_alignment.to_constant ())'.
but perhaps equally important, in `get_load_store_type' we have:
if (*memory_access_type == VMAT_ELEMENTWISE
|| *memory_access_type == VMAT_GATHER_SCATTER_LEGACY
|| *memory_access_type == VMAT_STRIDED_SLP
|| *memory_access_type == VMAT_INVARIANT)
{
*alignment_support_scheme = dr_unaligned_supported;
*misalignment = DR_MISALIGNMENT_UNKNOWN;
}
So I think that at the moment, irrespective of the value given to
DR_TARGET_ALIGNMENT, we never actually bother checking alignment as it's
assumed that unaligned is okay. In the context in question I suppose we
need `*alignment_support_scheme = dr_unaligned_unsupported' here.
I have a patch that fixes these two issues (assuming both are necessary and
sufficient fixes), I'm checking whether all regression tests pass at
present.
The original patch was intended as a minimal check for correctness that
could
be backported to gcc15, rejecting flawed cases, with the bigger patch
which is
currently under test going into gcc16 allowing the potential peeling for
alignment to continue to apply.
Once I've got a working fix for the issues highlighted above, I'll post
as a v2.
Any feedback here is most welcome.
Cheers,
V.
So I wonder if we override it somewhere?
Thanks,
Tamar
Richard.
gcc/ChangeLog:
PR tree-optimization/124037
* tree-vect-stmts.cc (get_load_store_type): Add check for use
context of VMAT_ELEMENTWISE loads.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr123588.c: New.
---
gcc/testsuite/gcc.dg/vect/pr124037.c | 57
++++++++++++++++++++++++++++
gcc/tree-vect-stmts.cc | 13 +++++++
2 files changed, 70 insertions(+)
create mode 100644 gcc/testsuite/gcc.dg/vect/pr124037.c
diff --git a/gcc/testsuite/gcc.dg/vect/pr124037.c
b/gcc/testsuite/gcc.dg/vect/pr124037.c
new file mode 100644
index 00000000000..3121380454b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr124037.c
@@ -0,0 +1,57 @@
+/* PR tree-optimization/124037 */
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target mmap } */
+/* { dg-require-effective-target vect_early_break } */
+#include <stdint.h>
+#include <stdio.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <unistd.h>
+
+#define MAX 65536
+
+typedef struct {
+ uint64_t a;
+ uint64_t b;
+ uint32_t flag;
+ uint32_t pad;
+} Data;
+
+int __attribute__ ((noinline))
+foo (Data *ptr) {
+ if (ptr == NULL)
+ return -1;
+
+ int cnt;
+ for (cnt = 0; cnt < MAX; cnt++) {
+ if (ptr->flag == 0)
+ break;
+ ptr++;
+ }
+ return cnt;
+}
+
+int main() {
+ long pgsz = sysconf (_SC_PAGESIZE);
+ if (pgsz == -1) {
+ fprintf (stderr, "sysconf failed\n");
+ return 0;
+ }
+
+ /* Allocate 2 consecutive pages. */
+ void *mem = mmap (NULL, pgsz * 2, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (mem == MAP_FAILED) {
+ fprintf (stderr, "mmap failed\n");
+ return 0;
+ }
+
+ memset (mem, 1, pgsz);
+ uint8_t *ptr = (uint8_t*) mem + pgsz;
+ memset (ptr - 16, 0, 16);
+
+ mprotect (ptr, pgsz, PROT_NONE);
+ Data *data = (Data *) (ptr - 80);
+ __builtin_printf("%d\n", foo(data));
+}
+
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index ee98e72d1e5..7434d934463 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -2581,6 +2581,19 @@ get_load_store_type (vec_info *vinfo,
stmt_vec_info stmt_info,
return false;
}
+ if (loop_vinfo
+ && !inbounds
+ && dr_safe_speculative_read_required (stmt_info)
+ && *memory_access_type == VMAT_ELEMENTWISE
+ && !STMT_VINFO_GATHER_SCATTER_P (stmt_info))
+ {
+ if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+ "unsafe use of elementwise access for buffer with"
+ "unknown bounds\n");
+ return false;
+ }
+
/* If this DR needs alignment for correctness, we must ensure the target
alignment is a constant power-of-two multiple of the amount read per
vector iteration or force masking. */
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
Nuernberg)