https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97043

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
     Ever confirmed|0                           |1
             Blocks|                            |96522
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-09-14

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
This blocks backporting the fix for PR96522, causing the gcc.dg/vect/pr81410.c
testcase to FAIL execution with an unaligned access using an aligned load.

The trunk rev. that fixed this is gbc484e250990393e887f7239157cc85ce6fadcce

A pragmatic fix might be

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index f6331eeea86..3fdf56f9335 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -2309,9 +2309,8 @@ vect_analyze_slp_instance (vec_info *vinfo,
                  /* The load requires permutation when unrolling exposes
                     a gap either because the group is larger than the SLP
                     group-size or because there is a gap between the groups. 
*/
-                 && (known_eq (unrolling_factor, 1U)
-                     || (group_size == DR_GROUP_SIZE (first_stmt_info)
-                         && DR_GROUP_GAP (first_stmt_info) == 0)))
+                 && group_size == DR_GROUP_SIZE (first_stmt_info)
+                 && DR_GROUP_GAP (first_stmt_info) == 0)
                {
                  SLP_TREE_LOAD_PERMUTATION (load_node).release ();
                  continue;

with biggest effects eventually on load-lane targets (arm/aarch64) where
we then eventually prefer more of those.  For the testcase in question
we then generate the following, matching trunk

        movdqa  (%rdx), %xmm2
        movdqa  16(%rdx), %xmm0
        shufpd  $1, 32(%rdx), %xmm0

instead of

        movdqa  (%rdx), %xmm1
        addq    $48, %rdx
        movdqu  -24(%rdx), %xmm2

(or with the backport of PR96522 a wrong movdqa in place of the movdqu).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96522
[Bug 96522] [9/10 Regression] Incorrect with with -O -fno-tree-pta

Reply via email to