[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-16 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #35 from rguenther at suse dot de  ---
On Tue, 16 Apr 2024, rearnsha at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231
> 
> --- Comment #34 from Richard Earnshaw  ---
> To be honest, I'm more concerned that we aren't eliminating a lot of these
> copies during the gimple optimization phase.  The memcpy is really a type
> punning step (that's strictly ISO C compliant, rather than using the GCC union
> extension), so ideally we'd recognize that and eliminate as many of the copies
> as possible (perhaps using some form of view_convert or whatever gimple is
> appropriate for changing the view without changing the contents).

Yeah, there's currently no way to represent a change just in the
effective type that wouldn't generate code in the end but still
serves as barrier for these type related optimizations.

When modifying the earlier store is an option then another possibility
would be to attach multiple effective types to it in some way.  Of course
that's pessimizing as well.

That said, the choice has been made to prune those "invalid" redundant
store removals but as we see here the implemented checks are not working
as intended.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-16 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #34 from Richard Earnshaw  ---
To be honest, I'm more concerned that we aren't eliminating a lot of these
copies during the gimple optimization phase.  The memcpy is really a type
punning step (that's strictly ISO C compliant, rather than using the GCC union
extension), so ideally we'd recognize that and eliminate as many of the copies
as possible (perhaps using some form of view_convert or whatever gimple is
appropriate for changing the view without changing the contents).

But that's for another day...

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=93946

--- Comment #33 from Richard Biener  ---
Ah, there's now the commoned mems_same_for_tbaa_p.  And indeed postreload
triggers on the cselib.cc instance.  But there we only have

(gdb) p debug_rtx (src_equiv)
(mem/c:SI (value:SI 90:4664 @0x421eba8/0x42e10c0) [1 MEM[(struct Vec128
*)_179]+12 S4 A32])

and in the loc list elt the setting_insn

(insn 89 88 93 14 (parallel [
(set (mem/c:SI (reg/f:SI 12 ip [201]) [1 MEM[(struct Vec128
*)_179]+0 S4 A64])
(reg:SI 0 r0))
(set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201])
(const_int 4 [0x4])) [1 MEM[(struct Vec128 *)_179]+4 S4
A32])
(reg:SI 1 r1))
(set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201])
(const_int 8 [0x8])) [1 MEM[(struct Vec128 *)_179]+8 S4
A64])
(reg:SI 2 r2))
(set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201])
(const_int 12 [0xc])) [1 MEM[(struct Vec128 *)_179]+12
S4 A32])
(reg:SI 3 r3))
]) 435 {*stm4_}
 (nil))

cselib_redundant_set_p isn't a good API to alter an earlier SET but it might
be adjusted to return it so postreload could pass in an optional output
parameter which when present would relax the alias check and return the
earlier SET for further consideration / altering.  Hoping CSELIB tables
will be unaffected by altering that insn.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #32 from Richard Biener  ---
(In reply to Richard Earnshaw from comment #31)
> While that does seem to fix the bug, it's at the cost of 6 additional stores
> in the problematic test that are redundant other than changing the alias set
> view.

The alternative is to alter the earlier store MEM_ATTRs to use an
alias-set covering both which usually means using alias-set zero.
This will pessimize followup optimizations around the store though
but it might be a good trade-off if done only late - I'd say
after sched2 but it doesn't look like theres CSE/DSE after it.
So maybe after sched1 which effectively means after reload, but there's
no regular CSE after reload either.  The latest CSE is pass_cse2.
IIRC a minor complication is that the earlier insn isn't readily
available - IIRC 'dest' is copied/mangled and not necessarily the
single origial RTX of the earlier SET_DEST (IIRC - it's been some time).

OTOH I think that correctness trumps optimization and if this is the
problematical transform then I don't see much options here.

In the place CSE applies the transform we'd have to set MEM_ALIAS_SET
to zero if the alias set condition doesn't hold and clear MEM_EXPR
if the MEM_EXPR condition doesn't hold.

Note I can't get the cse.cc code to trigger with the full preprocessed
source and a cross to arm and using -O2 -fno-exceptions -march=armv7-a
-mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee -fmath-errno

You mention at one point an insn removed by postreload, but that doesn't
use alias_set_subset_of.  I also don't remember postreload doing redundant
store removal.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-15 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #31 from Richard Earnshaw  ---
While that does seem to fix the bug, it's at the cost of 6 additional stores in
the problematic test that are redundant other than changing the alias set view.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #30 from Richard Biener  ---
I have tested the following since that might confuse the redundant store
removal sanity checks.  It bootstraps fine on x86-64-unknown-linux-gnu but
causes

FAIL: gcc.dg/tree-ssa/ssa-dse-36.c scan-tree-dump-times dse1 "Deleted redundant
call" 3
FAIL: gcc.dg/tree-ssa/ssa-dse-36.c scan-tree-dump-times dse1 "Deleted redundant
store" 3

in particular foo1 and foo2 are no longer optimized.  Specifically foo1:

-  x = {};
+  MEM  [(struct X *)] = {};
+  memset (, 0, 10);

the lack of the 'memset' removal looks fishy since memset uses alias set
zero while the earlier store uses the alias set of struct X (but contains
alias set zero because of the char[] members).  For foo2:

   x = {};
+  x.mem1[5] = 0;

the issue is less clear since 'x' is also involved in the store to
x.mem1[5] (but that store also uses alias-set zero).  This shows the
situation is a bit odd wrt the behavior of a whole-aggregate store vs.
a component-wise store.  But again in both cases a later conflict check
with say *(int *)p, while conflicting with the memset and x.mem1[5] stores,
would not conflict with the x = {} store.

So this fallout is to be expected and desired.

diff --git a/gcc/alias.cc b/gcc/alias.cc
index 808e2095d9b..bacae30db18 100644
--- a/gcc/alias.cc
+++ b/gcc/alias.cc
@@ -427,9 +427,7 @@ alias_set_subset_of (alias_set_type set1, alias_set_type
set2) 

   /* Check if set1 is a subset of set2.  */
   ase2 = get_alias_set_entry (set2);
-  if (ase2 != 0
-  && (ase2->has_zero_child
- || (ase2->children && ase2->children->get (set1
+  if (ase2 != 0 && ase2->children && ase2->children->get (set1))
 return true;

   /* As a special case we consider alias set of "void *" to be both subset

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #29 from Richard Earnshaw  ---
Sorry, I was looking at the wrong pair of insns.  The earlier store to that
location was insn 111.

111: [r212:SI (1 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI}

It appears that the problem is a disagreement between alias_set_subset_of ()
and alias_sets_conflict_p().  The former thinks sets 1 and 2 have a permissible
subset relationship (2 is a subset of 1), so removes the later store during
postreload.  The latter is then used by alias_sets_conflict_p which thinks
there is no conflict between the two sets and fails to add a scheduling
dependency before sched2.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #28 from Richard Biener  ---
(In reply to Richard Earnshaw from comment #27)
> (In reply to Richard Earnshaw from comment #26)
> > (In reply to Richard Biener from comment #25)
> > > I think it's more interesting why
> > > 
> > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> > > {r0:SI..r3:SI}
> > > 
> > > isn't considered as dependence?  Why does the earlier insn even come into
> > > play?  What's the breaking transform?  I guess insn 119 and 120 are
> > > exchanged?
> > 
> > Because 119 was deleted by postreload.  Doh! I should have spotted that.
> 
> But that ought to be ok, insn 115 is a store in alias set 0, so is picked up
> by later alias analysis.  It's just that the compiler then digs deeper and
> decides that that isn't an addressable object (at the gimple level) so there
> can't really be a dependency.

>   112: r214:SI=r109:SI-0x60
>   115: [r214:SI (0 MEM  [(char * {ref-all})]+0 S4
> A64)] = {r0:SI..r3:SI}
> ; _179 = D.33805
>   117: r217:SI=r109:SI-0x60
>   118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)]
>   116: r216:SI=r109:SI-0x10
> * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> {r0:SI..r3:SI}
> ; r218 = _179
> * 120: r218:V8HI=[r109:SI-0x10 (3 MEM  [(short int
> *)_179]+0 S16 A64)]

but 115 doesn't store at the same address as 119?  Yes, it has the same
value.

So it doesn't seem to be stack-slot sharing.  When we'd share D.33805
with *_179 then we'd have made D.33805 TREE_ADDRESSABLE and adjusted
points-to sets accordingly in update_alias_info_with_stack_vars.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #27 from Richard Earnshaw  ---
(In reply to Richard Earnshaw from comment #26)
> (In reply to Richard Biener from comment #25)
> > I think it's more interesting why
> > 
> > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> > {r0:SI..r3:SI}
> > 
> > isn't considered as dependence?  Why does the earlier insn even come into
> > play?  What's the breaking transform?  I guess insn 119 and 120 are
> > exchanged?
> 
> Because 119 was deleted by postreload.  Doh! I should have spotted that.

But that ought to be ok, insn 115 is a store in alias set 0, so is picked up by
later alias analysis.  It's just that the compiler then digs deeper and decides
that that isn't an addressable object (at the gimple level) so there can't
really be a dependency.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #26 from Richard Earnshaw  ---
(In reply to Richard Biener from comment #25)
> I think it's more interesting why
> 
> * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> {r0:SI..r3:SI}
> 
> isn't considered as dependence?  Why does the earlier insn even come into
> play?  What's the breaking transform?  I guess insn 119 and 120 are
> exchanged?

Because 119 was deleted by postreload.  Doh! I should have spotted that.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #25 from Richard Biener  ---
I think it's more interesting why

* 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
{r0:SI..r3:SI}

isn't considered as dependence?  Why does the earlier insn even come into
play?  What's the breaking transform?  I guess insn 119 and 120 are
exchanged?

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #24 from Andrew Pinski  ---
(In reply to Richard Earnshaw from comment #21)
> With my new testcase, compiled on an arm-none-eabi cross with 
> 
> cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard
> -mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4
> -march=armv7-a+neon-vfpv4 -O2 -fPIE -fvisibility=hidden
> -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno
> -fno-exceptions
> 
> The critical sequence, at the end of gimple optimization is:
> 
>   v = b;
>   MEM  [(char * {ref-all})] = MEM  char[16]> [(char * {ref-all})];
>   v ={v} {CLOBBER(eol)};
>   v = D.33805;
>   vect__239.652_700 = MEM  [(short int *)];
>   vect__240.653_702 = vect__239.652_700 << 8;
> 
> This generates the following (pseudo) rtl:
> 
> ; D.33805 = _179
>   113: r215:SI=r109:SI-0x10
>   114: {r0:SI..r3:SI} = [r215:SI (0 MEM  [(char *
> {ref-all})_179]+0 S4 A64)]
>   112: r214:SI=r109:SI-0x60
>   115: [r214:SI (0 MEM  [(char * {ref-all})]+0 S4
> A64)] = {r0:SI..r3:SI}
> ; _179 = D.33805
>   117: r217:SI=r109:SI-0x60
>   118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)]
>   116: r216:SI=r109:SI-0x10
> * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
> {r0:SI..r3:SI}
> ; r218 = _179
> * 120: r218:V8HI=[r109:SI-0x10 (3 MEM  [(short int
> *)_179]+0 S16 A64)]
>   121: r178:V8HI=unspec[r218:V8HI,const_vector] 451
> 
> The two key instructions have been starred. 
> 
> Things proceed OK until sched2, at which point, when building the
> dependencies, we fail to create a link between i119 and i120.  I've tracked
> this as far as ptr_deref_may_alias_decl_p (), where the call to
> may_be_aliased () decides that D.33805 cannot be aliased and thus there's no
> dependency.  But it's not clear to me why we've tracked back to the copy
> before the load of interest, nor why, at this point, we're looking at tree
> addressability to decide whether or not there are memory dependencies here.

This making it sound like one of the -fstack-reuse= issues (see the linked bug
reports from PR 111843). Does -fstack-reuse=none help?

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #23 from Richard Earnshaw  ---
#0  ptr_deref_may_alias_decl_p (ptr=0x75e0c678, decl=0x75dff000)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:295
#1  0x01768173 in indirect_ref_may_alias_decl_p (ref1=0x75e9ad98, 
base1=0x75e9ad98, offset1=..., max_size1=..., size1=..., 
ref1_alias_set=3, base1_alias_set=3, ref2=0x75deae60, 
base2=0x75dff000, offset2=..., max_size2=..., size2=..., 
ref2_alias_set=0, base2_alias_set=0, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2102
#2  0x01769541 in refs_may_alias_p_2 (ref1=0x7fffceb0, 
ref2=0x7fffce70, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2505
#3  0x0176968a in refs_may_alias_p_1 (ref1=0x7fffce70, 
ref2=0x7fffceb0, tbaa_p=false)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2534
#4  0x00f7bf7d in rtx_refs_may_alias_p (x=0x75ed3b40, 
mem=0x75e9c9d8, tbaa_p=true)
at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/alias.cc:366
#5  0x00f8243b in true_dependence_1 (mem=0x75e9c9d8, 
mem_mode=E_SImode, mem_addr=0x75e9c9c0, x=0x75ed3b40, 
x_addr=0x75ed3b28, mem_canonicalized=false)

Where (in true_dependence_1):
p mem
$96 = (const_rtx) 0x75e9c9d8
(gdb) pr
(mem/c:SI (plus:SI (reg/f:SI 14 lr [214])
(const_int 4 [0x4])) [0 MEM  [(char *
{ref-all})]+4 S4 A32])

p x
$97 = (const_rtx) 0x75ed3b40
(gdb) pr
(mem/c:V8HI (plus:SI (reg/f:SI 13 sp)
(const_int 256 [0x100])) [3 MEM  [(short int
*)_179]+0 S16 A64])

in refs_may_alias_p_1:
p *ref1
$99 = {ref = 0x75e9ad98, base = 0x75e9ad98, 
  offset = {> = {coeffs = {0}}, }, 
  size = {> = {coeffs = {128}}, }, 
  max_size = {> = {coeffs = {128}}, }, 
  ref_alias_set = 3, base_alias_set = 3, volatile_p = false}
p *ref2
$100 = {ref = 0x75deae60, base = 0x75dff000, 
  offset = {> = {coeffs = {32}}, }, 
  size = {> = {coeffs = {32}}, }, 
  max_size = {> = {coeffs = {128}}, }, 
  ref_alias_set = 0, base_alias_set = 0, volatile_p = false}

p ref1->ref
$101 = (tree) 0x75e9ad98
(gdb) pt
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x77405498 precision:16 min  max

pointer_to_this  reference_to_this
>
V8HI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x7752d7e0 nunits:8
pointer_to_this >

arg:0 
sizes-gimplified public unsigned type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type
0x7740c150
pointer_to_this  reference_to_this
>
var 
def_stmt 
version:179
ptr-info 0x75e71468>
arg:1 
constant 0>>

p ref1->base
$102 = (tree) 0x75e9ad98
(gdb) pt
 
unit-size 
align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x77405498 precision:16 min  max

pointer_to_this  reference_to_this
>
V8HI
size 
unit-size 
align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type
0x7752d7e0 nunits:8
pointer_to_this >

arg:0 
sizes-gimplified public unsigned type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type
0x7740c150
pointer_to_this  reference_to_this
>
var 
def_stmt 
version:179
ptr-info 0x75e71468>
arg:1 
constant 0>>

p ref2->ref
$103 = (tree) 0x75deae60
(gdb) pt
 
unit-size 
align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x77405348 precision:8 min  max >
BLK
size 
unit-size 
user align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x76322d20
domain 
sizes-gimplified public type_6 SI
size 
unit-size 
align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x76b33d20 precision:32 min  max >
pointer_to_this >

arg:0 
public unsigned SI size  unit-size

align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
0x766db5e8>

arg:0 
used ignored BLK ../hwy-pr111231-cpp.cc:4461:27 size  unit-size 
align:64 warn_if_not_align:0 context  abstract_origin 
(mem/c:BLK (plus:SI (reg/f:SI 109 virtual-stack-vars)
(const_int -96 [0xffa0])) [2 D.33805+0 S16 A64])>
../hwy-pr111231-cpp.cc:4346:16 start: ../hwy-pr111231-cpp.cc:4346:3
finish: ../hwy-pr111231-cpp.cc:4346:24>
arg:1 
constant 0>>
p ref2->base
$104 = (tree) 0x75dff000
(gdb) pt
 
unit-size 
align:16 

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #22 from Richard Earnshaw  ---
(Previous analysis is based on gcc-13 branch)

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Richard Earnshaw  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #21 from Richard Earnshaw  ---
With my new testcase, compiled on an arm-none-eabi cross with 

cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard
-mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4 -march=armv7-a+neon-vfpv4
-O2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants
-fmath-errno -fno-exceptions

The critical sequence, at the end of gimple optimization is:

  v = b;
  MEM  [(char * {ref-all})] = MEM  [(char * {ref-all})];
  v ={v} {CLOBBER(eol)};
  v = D.33805;
  vect__239.652_700 = MEM  [(short int *)];
  vect__240.653_702 = vect__239.652_700 << 8;

This generates the following (pseudo) rtl:

; D.33805 = _179
  113: r215:SI=r109:SI-0x10
  114: {r0:SI..r3:SI} = [r215:SI (0 MEM  [(char *
{ref-all})_179]+0 S4 A64)]
  112: r214:SI=r109:SI-0x60
  115: [r214:SI (0 MEM  [(char * {ref-all})]+0 S4
A64)] = {r0:SI..r3:SI}
; _179 = D.33805
  117: r217:SI=r109:SI-0x60
  118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)]
  116: r216:SI=r109:SI-0x10
* 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] =
{r0:SI..r3:SI}
; r218 = _179
* 120: r218:V8HI=[r109:SI-0x10 (3 MEM  [(short int
*)_179]+0 S16 A64)]
  121: r178:V8HI=unspec[r218:V8HI,const_vector] 451

The two key instructions have been starred. 

Things proceed OK until sched2, at which point, when building the dependencies,
we fail to create a link between i119 and i120.  I've tracked this as far as
ptr_deref_may_alias_decl_p (), where the call to may_be_aliased () decides that
D.33805 cannot be aliased and thus there's no dependency.  But it's not clear
to me why we've tracked back to the copy before the load of interest, nor why,
at this point, we're looking at tree addressability to decide whether or not
there are memory dependencies here.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-04-11 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #20 from Richard Earnshaw  ---
Created attachment 57928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57928=edit
fully preprocessed testcase

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-25 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Richard Biener  changed:

   What|Removed |Added

   Priority|P1  |P2

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-22 Thread rearnsha at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

--- Comment #19 from Richard Earnshaw  ---
This is another problem with (I suspect) incorrect aliasing information.  If I
compile with -fno-strict-aliasing, I get

  88:   f4432a1fvst1.8  {d18-d19}, [r3 :64] // {>E}   SP+96/16
  8c:   f4420a1fvst1.8  {d16-d17}, [r2 :64] // {>A}   SP+32/16
  90:   e893000fldm r3, {r0, r1, r2, r3}// {G}   SP+128/16
  98:   eddd0b20vldrd16, [sp, #128] ; 0x80  // {B}   SP+48/16
  a4:   e28dc040add ip, sp, #64 ; 0x40
  a8:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16
  ac:   f2d80570vshl.s16q8, q8, #8
  b0:   f3f503e0vneg.s16q8, q8
  b4:   edcd0b20vstrd16, [sp, #128] ; 0x80  // {>G.l} SP+128/8
  b8:   edcd1b22vstrd17, [sp, #136] ; 0x88  // {>G.h} SP+136/8
  bc:   e894000fldm r4, {r0, r1, r2, r3}// {C}   SP+64/16
  c4:   e28dc050add ip, sp, #80 ; 0x50
  c8:   e88c000fstm ip, {r0, r1, r2, r3}// {>D}   SP+80/16
  cc:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16

I've annotated each memory access with its stack address and labeled each
16-byte slot from A to G.

With -fstrict-aliasing this becomes:

  88:   f4420a1fvst1.8  {d16-d17}, [r2 :64] // {>A}   SP+32/16
  8c:   eddd0b20vldrd16, [sp, #128] ; 0x80  // {E}   SP+96/16
  98:   e893000fldm r3, {r0, r1, r2, r3}// {B}   SP+48/16
  a0:   e28dc040add ip, sp, #64 ; 0x40
  a4:   f2d80570vshl.s16q8, q8, #8
  a8:   e884000fstm r4, {r0, r1, r2, r3}// {>G}   SP+128/16
!
  ac:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16
  b0:   f3f503e0vneg.s16q8, q8
  b4:   edcd0b20vstrd16, [sp, #128] ; 0x80  // {>G.l} SP+128/8
  b8:   edcd1b22vstrd17, [sp, #136] ; 0x88  // {>G.h} SP+136/8
  bc:   e894000fldm r4, {r0, r1, r2, r3}// {C}   SP+64/16
  c4:   e28dc050add ip, sp, #80 ; 0x50
  c8:   e88c000fstm ip, {r0, r1, r2, r3}// {>D}   SP+80/16
  cc:   e885000fstm r5, {r0, r1, r2, r3}// {>F}   SP+112/16

And we see that the initial store to G has been moved after the reads from it. 
I'm still digging, but it may be pertinent that the reads have been split into
two separate instructions; perhaps when the split was done the alias sets
weren't copied correctly.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P1

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-16 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Sam James  changed:

   What|Removed |Added

   Target Milestone|--- |12.4
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-03-17

--- Comment #18 from Sam James  ---
Confirmed.

[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)

2024-03-16 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231

Sam James  changed:

   What|Removed |Added

  Known to fail||14.0
  Known to work||11.4.1
Summary|armhf: Miscompilation with  |[12/13/14 regression]
   |-O2/-fno-exceptions level   |armhf: Miscompilation with
   |(-fno-tree-vectorize is |-O2/-fno-exceptions level
   |working)|(-fno-tree-vectorize is
   ||working)

--- Comment #17 from Sam James  ---
Adding missing regression markers. 11 is fine for me.