[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #35 from rguenther at suse dot de --- On Tue, 16 Apr 2024, rearnsha at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 > > --- Comment #34 from Richard Earnshaw --- > To be honest, I'm more concerned that we aren't eliminating a lot of these > copies during the gimple optimization phase. The memcpy is really a type > punning step (that's strictly ISO C compliant, rather than using the GCC union > extension), so ideally we'd recognize that and eliminate as many of the copies > as possible (perhaps using some form of view_convert or whatever gimple is > appropriate for changing the view without changing the contents). Yeah, there's currently no way to represent a change just in the effective type that wouldn't generate code in the end but still serves as barrier for these type related optimizations. When modifying the earlier store is an option then another possibility would be to attach multiple effective types to it in some way. Of course that's pessimizing as well. That said, the choice has been made to prune those "invalid" redundant store removals but as we see here the implemented checks are not working as intended.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #34 from Richard Earnshaw --- To be honest, I'm more concerned that we aren't eliminating a lot of these copies during the gimple optimization phase. The memcpy is really a type punning step (that's strictly ISO C compliant, rather than using the GCC union extension), so ideally we'd recognize that and eliminate as many of the copies as possible (perhaps using some form of view_convert or whatever gimple is appropriate for changing the view without changing the contents). But that's for another day...
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=93946 --- Comment #33 from Richard Biener --- Ah, there's now the commoned mems_same_for_tbaa_p. And indeed postreload triggers on the cselib.cc instance. But there we only have (gdb) p debug_rtx (src_equiv) (mem/c:SI (value:SI 90:4664 @0x421eba8/0x42e10c0) [1 MEM[(struct Vec128 *)_179]+12 S4 A32]) and in the loc list elt the setting_insn (insn 89 88 93 14 (parallel [ (set (mem/c:SI (reg/f:SI 12 ip [201]) [1 MEM[(struct Vec128 *)_179]+0 S4 A64]) (reg:SI 0 r0)) (set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201]) (const_int 4 [0x4])) [1 MEM[(struct Vec128 *)_179]+4 S4 A32]) (reg:SI 1 r1)) (set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201]) (const_int 8 [0x8])) [1 MEM[(struct Vec128 *)_179]+8 S4 A64]) (reg:SI 2 r2)) (set (mem/c:SI (plus:SI (reg/f:SI 12 ip [201]) (const_int 12 [0xc])) [1 MEM[(struct Vec128 *)_179]+12 S4 A32]) (reg:SI 3 r3)) ]) 435 {*stm4_} (nil)) cselib_redundant_set_p isn't a good API to alter an earlier SET but it might be adjusted to return it so postreload could pass in an optional output parameter which when present would relax the alias check and return the earlier SET for further consideration / altering. Hoping CSELIB tables will be unaffected by altering that insn.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #32 from Richard Biener --- (In reply to Richard Earnshaw from comment #31) > While that does seem to fix the bug, it's at the cost of 6 additional stores > in the problematic test that are redundant other than changing the alias set > view. The alternative is to alter the earlier store MEM_ATTRs to use an alias-set covering both which usually means using alias-set zero. This will pessimize followup optimizations around the store though but it might be a good trade-off if done only late - I'd say after sched2 but it doesn't look like theres CSE/DSE after it. So maybe after sched1 which effectively means after reload, but there's no regular CSE after reload either. The latest CSE is pass_cse2. IIRC a minor complication is that the earlier insn isn't readily available - IIRC 'dest' is copied/mangled and not necessarily the single origial RTX of the earlier SET_DEST (IIRC - it's been some time). OTOH I think that correctness trumps optimization and if this is the problematical transform then I don't see much options here. In the place CSE applies the transform we'd have to set MEM_ALIAS_SET to zero if the alias set condition doesn't hold and clear MEM_EXPR if the MEM_EXPR condition doesn't hold. Note I can't get the cse.cc code to trigger with the full preprocessed source and a cross to arm and using -O2 -fno-exceptions -march=armv7-a -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee -fmath-errno You mention at one point an insn removed by postreload, but that doesn't use alias_set_subset_of. I also don't remember postreload doing redundant store removal.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #31 from Richard Earnshaw --- While that does seem to fix the bug, it's at the cost of 6 additional stores in the problematic test that are redundant other than changing the alias set view.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #30 from Richard Biener --- I have tested the following since that might confuse the redundant store removal sanity checks. It bootstraps fine on x86-64-unknown-linux-gnu but causes FAIL: gcc.dg/tree-ssa/ssa-dse-36.c scan-tree-dump-times dse1 "Deleted redundant call" 3 FAIL: gcc.dg/tree-ssa/ssa-dse-36.c scan-tree-dump-times dse1 "Deleted redundant store" 3 in particular foo1 and foo2 are no longer optimized. Specifically foo1: - x = {}; + MEM [(struct X *)] = {}; + memset (, 0, 10); the lack of the 'memset' removal looks fishy since memset uses alias set zero while the earlier store uses the alias set of struct X (but contains alias set zero because of the char[] members). For foo2: x = {}; + x.mem1[5] = 0; the issue is less clear since 'x' is also involved in the store to x.mem1[5] (but that store also uses alias-set zero). This shows the situation is a bit odd wrt the behavior of a whole-aggregate store vs. a component-wise store. But again in both cases a later conflict check with say *(int *)p, while conflicting with the memset and x.mem1[5] stores, would not conflict with the x = {} store. So this fallout is to be expected and desired. diff --git a/gcc/alias.cc b/gcc/alias.cc index 808e2095d9b..bacae30db18 100644 --- a/gcc/alias.cc +++ b/gcc/alias.cc @@ -427,9 +427,7 @@ alias_set_subset_of (alias_set_type set1, alias_set_type set2) /* Check if set1 is a subset of set2. */ ase2 = get_alias_set_entry (set2); - if (ase2 != 0 - && (ase2->has_zero_child - || (ase2->children && ase2->children->get (set1 + if (ase2 != 0 && ase2->children && ase2->children->get (set1)) return true; /* As a special case we consider alias set of "void *" to be both subset
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #29 from Richard Earnshaw --- Sorry, I was looking at the wrong pair of insns. The earlier store to that location was insn 111. 111: [r212:SI (1 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI} It appears that the problem is a disagreement between alias_set_subset_of () and alias_sets_conflict_p(). The former thinks sets 1 and 2 have a permissible subset relationship (2 is a subset of 1), so removes the later store during postreload. The latter is then used by alias_sets_conflict_p which thinks there is no conflict between the two sets and fails to add a scheduling dependency before sched2.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #28 from Richard Biener --- (In reply to Richard Earnshaw from comment #27) > (In reply to Richard Earnshaw from comment #26) > > (In reply to Richard Biener from comment #25) > > > I think it's more interesting why > > > > > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > > > {r0:SI..r3:SI} > > > > > > isn't considered as dependence? Why does the earlier insn even come into > > > play? What's the breaking transform? I guess insn 119 and 120 are > > > exchanged? > > > > Because 119 was deleted by postreload. Doh! I should have spotted that. > > But that ought to be ok, insn 115 is a store in alias set 0, so is picked up > by later alias analysis. It's just that the compiler then digs deeper and > decides that that isn't an addressable object (at the gimple level) so there > can't really be a dependency. > 112: r214:SI=r109:SI-0x60 > 115: [r214:SI (0 MEM [(char * {ref-all})]+0 S4 > A64)] = {r0:SI..r3:SI} > ; _179 = D.33805 > 117: r217:SI=r109:SI-0x60 > 118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)] > 116: r216:SI=r109:SI-0x10 > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > {r0:SI..r3:SI} > ; r218 = _179 > * 120: r218:V8HI=[r109:SI-0x10 (3 MEM [(short int > *)_179]+0 S16 A64)] but 115 doesn't store at the same address as 119? Yes, it has the same value. So it doesn't seem to be stack-slot sharing. When we'd share D.33805 with *_179 then we'd have made D.33805 TREE_ADDRESSABLE and adjusted points-to sets accordingly in update_alias_info_with_stack_vars.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #27 from Richard Earnshaw --- (In reply to Richard Earnshaw from comment #26) > (In reply to Richard Biener from comment #25) > > I think it's more interesting why > > > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > > {r0:SI..r3:SI} > > > > isn't considered as dependence? Why does the earlier insn even come into > > play? What's the breaking transform? I guess insn 119 and 120 are > > exchanged? > > Because 119 was deleted by postreload. Doh! I should have spotted that. But that ought to be ok, insn 115 is a store in alias set 0, so is picked up by later alias analysis. It's just that the compiler then digs deeper and decides that that isn't an addressable object (at the gimple level) so there can't really be a dependency.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #26 from Richard Earnshaw --- (In reply to Richard Biener from comment #25) > I think it's more interesting why > > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > {r0:SI..r3:SI} > > isn't considered as dependence? Why does the earlier insn even come into > play? What's the breaking transform? I guess insn 119 and 120 are > exchanged? Because 119 was deleted by postreload. Doh! I should have spotted that.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #25 from Richard Biener --- I think it's more interesting why * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI} isn't considered as dependence? Why does the earlier insn even come into play? What's the breaking transform? I guess insn 119 and 120 are exchanged?
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #24 from Andrew Pinski --- (In reply to Richard Earnshaw from comment #21) > With my new testcase, compiled on an arm-none-eabi cross with > > cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard > -mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4 > -march=armv7-a+neon-vfpv4 -O2 -fPIE -fvisibility=hidden > -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno > -fno-exceptions > > The critical sequence, at the end of gimple optimization is: > > v = b; > MEM [(char * {ref-all})] = MEM char[16]> [(char * {ref-all})]; > v ={v} {CLOBBER(eol)}; > v = D.33805; > vect__239.652_700 = MEM [(short int *)]; > vect__240.653_702 = vect__239.652_700 << 8; > > This generates the following (pseudo) rtl: > > ; D.33805 = _179 > 113: r215:SI=r109:SI-0x10 > 114: {r0:SI..r3:SI} = [r215:SI (0 MEM [(char * > {ref-all})_179]+0 S4 A64)] > 112: r214:SI=r109:SI-0x60 > 115: [r214:SI (0 MEM [(char * {ref-all})]+0 S4 > A64)] = {r0:SI..r3:SI} > ; _179 = D.33805 > 117: r217:SI=r109:SI-0x60 > 118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)] > 116: r216:SI=r109:SI-0x10 > * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = > {r0:SI..r3:SI} > ; r218 = _179 > * 120: r218:V8HI=[r109:SI-0x10 (3 MEM [(short int > *)_179]+0 S16 A64)] > 121: r178:V8HI=unspec[r218:V8HI,const_vector] 451 > > The two key instructions have been starred. > > Things proceed OK until sched2, at which point, when building the > dependencies, we fail to create a link between i119 and i120. I've tracked > this as far as ptr_deref_may_alias_decl_p (), where the call to > may_be_aliased () decides that D.33805 cannot be aliased and thus there's no > dependency. But it's not clear to me why we've tracked back to the copy > before the load of interest, nor why, at this point, we're looking at tree > addressability to decide whether or not there are memory dependencies here. This making it sound like one of the -fstack-reuse= issues (see the linked bug reports from PR 111843). Does -fstack-reuse=none help?
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #23 from Richard Earnshaw --- #0 ptr_deref_may_alias_decl_p (ptr=0x75e0c678, decl=0x75dff000) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:295 #1 0x01768173 in indirect_ref_may_alias_decl_p (ref1=0x75e9ad98, base1=0x75e9ad98, offset1=..., max_size1=..., size1=..., ref1_alias_set=3, base1_alias_set=3, ref2=0x75deae60, base2=0x75dff000, offset2=..., max_size2=..., size2=..., ref2_alias_set=0, base2_alias_set=0, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2102 #2 0x01769541 in refs_may_alias_p_2 (ref1=0x7fffceb0, ref2=0x7fffce70, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2505 #3 0x0176968a in refs_may_alias_p_1 (ref1=0x7fffce70, ref2=0x7fffceb0, tbaa_p=false) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/tree-ssa-alias.cc:2534 #4 0x00f7bf7d in rtx_refs_may_alias_p (x=0x75ed3b40, mem=0x75e9c9d8, tbaa_p=true) at /home/rearnsha/gnusrc/gcc-cross/gcc-13/gcc/alias.cc:366 #5 0x00f8243b in true_dependence_1 (mem=0x75e9c9d8, mem_mode=E_SImode, mem_addr=0x75e9c9c0, x=0x75ed3b40, x_addr=0x75ed3b28, mem_canonicalized=false) Where (in true_dependence_1): p mem $96 = (const_rtx) 0x75e9c9d8 (gdb) pr (mem/c:SI (plus:SI (reg/f:SI 14 lr [214]) (const_int 4 [0x4])) [0 MEM [(char * {ref-all})]+4 S4 A32]) p x $97 = (const_rtx) 0x75ed3b40 (gdb) pr (mem/c:V8HI (plus:SI (reg/f:SI 13 sp) (const_int 256 [0x100])) [3 MEM [(short int *)_179]+0 S16 A64]) in refs_may_alias_p_1: p *ref1 $99 = {ref = 0x75e9ad98, base = 0x75e9ad98, offset = {> = {coeffs = {0}}, }, size = {> = {coeffs = {128}}, }, max_size = {> = {coeffs = {128}}, }, ref_alias_set = 3, base_alias_set = 3, volatile_p = false} p *ref2 $100 = {ref = 0x75deae60, base = 0x75dff000, offset = {> = {coeffs = {32}}, }, size = {> = {coeffs = {32}}, }, max_size = {> = {coeffs = {128}}, }, ref_alias_set = 0, base_alias_set = 0, volatile_p = false} p ref1->ref $101 = (tree) 0x75e9ad98 (gdb) pt unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x77405498 precision:16 min max pointer_to_this reference_to_this > V8HI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x7752d7e0 nunits:8 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type 0x7740c150 pointer_to_this reference_to_this > var def_stmt version:179 ptr-info 0x75e71468> arg:1 constant 0>> p ref1->base $102 = (tree) 0x75e9ad98 (gdb) pt unit-size align:16 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x77405498 precision:16 min max pointer_to_this reference_to_this > V8HI size unit-size align:64 warn_if_not_align:0 symtab:0 alias-set 3 canonical-type 0x7752d7e0 nunits:8 pointer_to_this > arg:0 sizes-gimplified public unsigned type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set 12 canonical-type 0x7740c150 pointer_to_this reference_to_this > var def_stmt version:179 ptr-info 0x75e71468> arg:1 constant 0>> p ref2->ref $103 = (tree) 0x75deae60 (gdb) pt unit-size align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x77405348 precision:8 min max > BLK size unit-size user align:16 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76322d20 domain sizes-gimplified public type_6 SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x76b33d20 precision:32 min max > pointer_to_this > arg:0 public unsigned SI size unit-size align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x766db5e8> arg:0 used ignored BLK ../hwy-pr111231-cpp.cc:4461:27 size unit-size align:64 warn_if_not_align:0 context abstract_origin (mem/c:BLK (plus:SI (reg/f:SI 109 virtual-stack-vars) (const_int -96 [0xffa0])) [2 D.33805+0 S16 A64])> ../hwy-pr111231-cpp.cc:4346:16 start: ../hwy-pr111231-cpp.cc:4346:3 finish: ../hwy-pr111231-cpp.cc:4346:24> arg:1 constant 0>> p ref2->base $104 = (tree) 0x75dff000 (gdb) pt unit-size align:16
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #22 from Richard Earnshaw --- (Previous analysis is based on gcc-13 branch)
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Richard Earnshaw changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #21 from Richard Earnshaw --- With my new testcase, compiled on an arm-none-eabi cross with cc1plus ../hwy-pr111231-cpp.cc -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee -marm -mlibarch=armv7-a+neon-vfpv4 -march=armv7-a+neon-vfpv4 -O2 -fPIE -fvisibility=hidden -fvisibility-inlines-hidden -fmerge-all-constants -fmath-errno -fno-exceptions The critical sequence, at the end of gimple optimization is: v = b; MEM [(char * {ref-all})] = MEM [(char * {ref-all})]; v ={v} {CLOBBER(eol)}; v = D.33805; vect__239.652_700 = MEM [(short int *)]; vect__240.653_702 = vect__239.652_700 << 8; This generates the following (pseudo) rtl: ; D.33805 = _179 113: r215:SI=r109:SI-0x10 114: {r0:SI..r3:SI} = [r215:SI (0 MEM [(char * {ref-all})_179]+0 S4 A64)] 112: r214:SI=r109:SI-0x60 115: [r214:SI (0 MEM [(char * {ref-all})]+0 S4 A64)] = {r0:SI..r3:SI} ; _179 = D.33805 117: r217:SI=r109:SI-0x60 118: {r0:SI..r3:SI} = [r217:SI (2 D.33805+0 S4 A64)] 116: r216:SI=r109:SI-0x10 * 119: [r216:SI (2 MEM[(struct Vec128 *)_179]+0 S4 A64)] = {r0:SI..r3:SI} ; r218 = _179 * 120: r218:V8HI=[r109:SI-0x10 (3 MEM [(short int *)_179]+0 S16 A64)] 121: r178:V8HI=unspec[r218:V8HI,const_vector] 451 The two key instructions have been starred. Things proceed OK until sched2, at which point, when building the dependencies, we fail to create a link between i119 and i120. I've tracked this as far as ptr_deref_may_alias_decl_p (), where the call to may_be_aliased () decides that D.33805 cannot be aliased and thus there's no dependency. But it's not clear to me why we've tracked back to the copy before the load of interest, nor why, at this point, we're looking at tree addressability to decide whether or not there are memory dependencies here.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #20 from Richard Earnshaw --- Created attachment 57928 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57928=edit fully preprocessed testcase
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Richard Biener changed: What|Removed |Added Priority|P1 |P2
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 --- Comment #19 from Richard Earnshaw --- This is another problem with (I suspect) incorrect aliasing information. If I compile with -fno-strict-aliasing, I get 88: f4432a1fvst1.8 {d18-d19}, [r3 :64] // {>E} SP+96/16 8c: f4420a1fvst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 90: e893000fldm r3, {r0, r1, r2, r3}// {G} SP+128/16 98: eddd0b20vldrd16, [sp, #128] ; 0x80 // {B} SP+48/16 a4: e28dc040add ip, sp, #64 ; 0x40 a8: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 ac: f2d80570vshl.s16q8, q8, #8 b0: f3f503e0vneg.s16q8, q8 b4: edcd0b20vstrd16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22vstrd17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000fldm r4, {r0, r1, r2, r3}// {C} SP+64/16 c4: e28dc050add ip, sp, #80 ; 0x50 c8: e88c000fstm ip, {r0, r1, r2, r3}// {>D} SP+80/16 cc: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 I've annotated each memory access with its stack address and labeled each 16-byte slot from A to G. With -fstrict-aliasing this becomes: 88: f4420a1fvst1.8 {d16-d17}, [r2 :64] // {>A} SP+32/16 8c: eddd0b20vldrd16, [sp, #128] ; 0x80 // {E} SP+96/16 98: e893000fldm r3, {r0, r1, r2, r3}// {B} SP+48/16 a0: e28dc040add ip, sp, #64 ; 0x40 a4: f2d80570vshl.s16q8, q8, #8 a8: e884000fstm r4, {r0, r1, r2, r3}// {>G} SP+128/16 ! ac: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 b0: f3f503e0vneg.s16q8, q8 b4: edcd0b20vstrd16, [sp, #128] ; 0x80 // {>G.l} SP+128/8 b8: edcd1b22vstrd17, [sp, #136] ; 0x88 // {>G.h} SP+136/8 bc: e894000fldm r4, {r0, r1, r2, r3}// {C} SP+64/16 c4: e28dc050add ip, sp, #80 ; 0x50 c8: e88c000fstm ip, {r0, r1, r2, r3}// {>D} SP+80/16 cc: e885000fstm r5, {r0, r1, r2, r3}// {>F} SP+112/16 And we see that the initial store to G has been moved after the reads from it. I'm still digging, but it may be pertinent that the reads have been split into two separate instructions; perhaps when the split was done the alias sets weren't copied correctly.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org Priority|P3 |P1
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Sam James changed: What|Removed |Added Target Milestone|--- |12.4 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-03-17 --- Comment #18 from Sam James --- Confirmed.
[Bug target/111231] [12/13/14 regression] armhf: Miscompilation with -O2/-fno-exceptions level (-fno-tree-vectorize is working)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111231 Sam James changed: What|Removed |Added Known to fail||14.0 Known to work||11.4.1 Summary|armhf: Miscompilation with |[12/13/14 regression] |-O2/-fno-exceptions level |armhf: Miscompilation with |(-fno-tree-vectorize is |-O2/-fno-exceptions level |working)|(-fno-tree-vectorize is ||working) --- Comment #17 from Sam James --- Adding missing regression markers. 11 is fine for me.