[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963 Richard Biener changed: What|Removed |Added Last reconfirmed||2020-05-06 Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Target Milestone|--- |11.0 --- Comment #1 from Richard Biener --- Confirmed. I've met the underlying issue when developing the patch and for this reason marked the conditional store inserted by LIM with no-warning. But for the testcase that's not enough since now PRE comes along and optimizes the var.field load away, re-exposing the issue. LIM transforms the testcase to (simplified a bit) void f (void) { if (pv != 0) { bool v2_set = false; bool varfield_set = false; int v2tem, varfield_tem; for (const P *ph = pv; ph < &pv[ps]; ++ph) switch (ph->p1) { case 1: v2tem = ph->p2; v2_set = true; break; case 2: varfield_tem = ph->p3; varfield_set = true; break; } if (varfield_set) var.field = varfield_tem; if (v2_set) v2 = v2tem; } if (var.field != 0) foo (&var); } where the uninit predicate analysis doesn't grok the relation between varfield_set and varfield_tem being initialized. The patch changed code generation to elide the previously emitted unconditional load of v2 and var.field. I suspected that for the case where there is no load the loop PHI for varfield_tem might be eliminated, but it is not in all cases it seems. Now apart from marking the store no-warning we could easily initialize the tems on loop entry, just not with their true value but for example with zero. That might result in less optimal out-of-SSA though (no coalescing with constants, the constant move needs to be emitted...) at least when the loop PHI is not eliminated. What works is initializing with an uninitialized variable marked TREE_NO_WARNING. I'm going to test that (eliding the no-warning on the conditional stores).
[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963 --- Comment #2 from Richard Biener --- Created attachment 48459 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48459&action=edit patch in testing Testing the attached.
[Bug tree-optimization/94964] [8/9/10/11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359 since r8-2993-ga7976089dba5e227
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94964 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Target Milestone|--- |8.5 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- Mine. The loop does not have a preheader we can sink to so gsi_insert_seq_on_edge_immediate will split the edge and the following add_phi_arg breaks. Now, the loop entry edge is an EH edge in this case, will dig what the appropriate solution is.
[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- Huh. mine.
[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965 --- Comment #2 from Richard Biener --- @@ -9319,7 +9364,8 @@ vectorizable_load (stmt_vec_info stmt_info, gimple_stmt_it erator *gsi, initialized yet, use first_stmt_info_for_drptr DR by bumping the distance from first_stmt_info DR instead as below. */ if (!diff_first_stmt_info) - msq = vect_setup_realignment (first_stmt_info, gsi, &realignment_token, + msq = vect_setup_realignment (loop_vinfo, + first_stmt_info, gsi, &realignment_token, alignment_support_scheme, NULL_TREE, &at_loop); if (alignment_support_scheme == dr_explicit_realign_optimized) that should have been 'vinfo', not 'loop_vinfo'.
[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Richard Biener --- Fixed.
[Bug c/94968] [10/11 Regression] internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.c:87
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94968 Richard Biener changed: What|Removed |Added Priority|P3 |P4 Target Milestone|--- |10.2
[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5 Keywords||wrong-code Last reconfirmed||2020-05-06 Status|UNCONFIRMED |NEW Summary|Invalid loop distribution |[8/9/10/11 Regression] ||Invalid loop distribution Known to work||7.5.0 Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- Confirmed. Works fine in GCC 7 which also says Creating dr for f[pretmp_5].e analyze_innermost: Applying pattern match.pd:84, generic-match.c:11461 failed: bit offset alignment. base_address: offset from base address: constant offset from base address: step: aligned to: base_object: f Access function 0: 7 Access function 1: pretmp_5 but (compute_affine_dependence stmt_a: f[pretmp_5] = g; stmt_b: _2 = f[pretmp_5].e; ) -> dependence analysis failed instead of (compute_affine_dependence stmt_a: f[pretmp_5] = g; stmt_b: _2 = f[pretmp_5].e; (analyze_overlapping_iterations (chrec_a = pretmp_5) (chrec_b = pretmp_5) (overlap_iterations_a = [0]) (overlap_iterations_b = [0])) )
[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution since r8-2390-gdfbddbeb1ca912c9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969 --- Comment #5 from Richard Biener --- So I think the issue is not dependence testing but loop distribution accepting a zero dependence distance as OK. Of course dependence analysis is quite useless here since the accesses are to the same location in every iteration. Bin, maybe you can share your thoughts on this issue? The testcase doesn't need bitfields - those just disable the cost model which otherwise prevents the distribution. diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c index 44423215332..ac272d63c3d 100644 --- a/gcc/tree-loop-distribution.c +++ b/gcc/tree-loop-distribution.c @@ -2852,6 +2852,7 @@ loop_distribution::finalize_partitions (class loop *loop, /* Don't distribute current loop into too many loops given we don't have memory stream cost model. Be even more conservative in case of loop nest distribution. */ +#if 0 if ((same_type_p && num_builtin == 0 && (loop->inner == NULL || num_normal != 2 || num_partial_memset != 1)) || (loop->inner != NULL @@ -2867,6 +2868,7 @@ loop_distribution::finalize_partitions (class loop *loop, } partitions->truncate (1); } +#endif /* Fuse memset builtins if possible. */ if (partitions->length () > 1) makes the testcase miscompiled even with the : 7 and : 2 commented, so plain struct S { signed m; signed e; };
[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution since r8-2390-gdfbddbeb1ca912c9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969 --- Comment #6 from Richard Biener --- Before Richards change we likely gave up on the mismatch in access function dimensionality for f[b] vs. f[b].e but now we compute a dependence distance of zero.
[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from Richard Biener --- Should be fixed.
[Bug tree-optimization/94964] [8/9/10 Regression] ICE in add_phi_arg, at tree-phinodes.c:359 since r8-2993-ga7976089dba5e227
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94964 Richard Biener changed: What|Removed |Added Known to fail|11.0|10.0 Summary|[8/9/10/11 Regression] ICE |[8/9/10 Regression] ICE in |in add_phi_arg, at |add_phi_arg, at |tree-phinodes.c:359 since |tree-phinodes.c:359 since |r8-2993-ga7976089dba5e227 |r8-2993-ga7976089dba5e227 Known to work||11.0 Priority|P3 |P2 --- Comment #3 from Richard Biener --- Fixed on trunk sofar.
[Bug target/94865] Failure to combine unpckhpd+unpcklpd into blendps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94865 --- Comment #2 from Richard Biener --- Missing match.pd patterns also include a no-op comb of insertion of an extracted element at the same position (simplify (bit_insert @0 (BIT_FIELD_REF @0 @size @pos) @pos) (if (size matches) @0) in addition to the requested (simplify (bit_insert @0 (BIT_FIELD_REF @1 @rsize @rpos) @ipos) (if (@0 and @1 are vectors compatible for a vec_perm) (vec_perm @0 @1 { shuffle-mask }))
[Bug c++/94973] compile error when trying to use pointer-to-member function as invokable projection to ranges::find()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94973 --- Comment #13 from Richard Biener --- Does MSVC still accept that [without diagnostic]? Maybe it's time to remove it completely...
[Bug fortran/94978] [8/9/10/11 Regression] Bogus warning "Array reference at (1) out of bounds in loop beginning at (2)"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94978 Richard Biener changed: What|Removed |Added Target Milestone|--- |8.5 Keywords||diagnostic
[Bug target/94865] Failure to combine unpckhpd+unpcklpd into blendps
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94865 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Known to work||11.0 Known to fail||10.0 --- Comment #33 from Richard Biener --- Fixed on trunk.
[Bug target/94980] [8/9/10/11 Regression] ICE: verify_gimple failed: position plus size exceeds size of referenced object in 'bit_field_ref' with -mavx512vl
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94980 Richard Biener changed: What|Removed |Added Keywords||wrong-code Priority|P3 |P2 Target Milestone|--- |8.5
[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener --- Addressed by the patch for PR94865.
[Bug tree-optimization/88540] Issues with vectorization of min/max operations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540 Richard Biener changed: What|Removed |Added Blocks||94864 Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #4 from Richard Biener --- Mine. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864 [Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd
[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Known to work||11.0 --- Comment #6 from Richard Biener --- Fixed for GCC 11.
[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Last reconfirmed||2020-05-08 Status|UNCONFIRMED |ASSIGNED Blocks||57359 Target Milestone|--- |11.0 Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- Ah, forgot to update this testcase. This is another instance of PR57359, that is, we may not sink the store to b across the store to *b since b may point to itself and with j == 1 we'd change b = b + 2; *b = x; to *b = x; b = b + 2; note there's a twist for this particular case, namely the preceeding load of 'b' gives us knowledge about the dynamic type of 'b' which means we could use that to assess that we _can_ exchange the stores. But that logic is not implemented. I'll see how to do that. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 [Bug 57359] store motion causes wrong code for union access at -O3
[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 --- Comment #2 from Richard Biener --- (In reply to Richard Biener from comment #1) > Ah, forgot to update this testcase. This is another instance of PR57359, > that is, we may not sink the store to b across the store to *b since b may > point > to itself and with j == 1 we'd change > > b = b + 2; > *b = x; > > to > > *b = x; > b = b + 2; > > note there's a twist for this particular case, namely the preceeding load > of 'b' gives us knowledge about the dynamic type of 'b' which means we > could use that to assess that we _can_ exchange the stores. > > But that logic is not implemented. > > I'll see how to do that. OK, we can't. Consider the following which we miscompile with GCC 10 but which is fixed on trunk. bar () is simply the inner loop of bar in the pr64110.c testcase. GCC 10 and earlier transform b++; *b = x; to tem = b + 1; *b = x; b = tem; which is wrong with b == &b, the *b = x store re-purposes the storage in 'b'. short *b; void __attribute__((noipa)) bar (short x, int j) { for (int i = 0; i < j; ++i) *b++ = x; } int main() { b = (short *)&b; bar (0, 1); if ((short)(unsigned long)b != 0) __builtin_abort (); return 0; } Now the only thing that can be done (as noted in PR57359) is re-materializing _both_ stores on the exit. Thus turn for (int i = 0; i < j; ++i) { tem = b; tem = tem + 1; b = tem; *tem = x; } into tem = b; for (int i = 0; i < j; ++i) { tem = tem + 1; *tem = x; } b = tem; *tem = x; when applying store-motion. Note this only works when b is written to unconditionally. It also needs some kind of a cost model I guess...
[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703 --- Comment #9 from Richard Biener --- (In reply to Rainer Orth from comment #7) > Created attachment 48483 [details] > 32-bit sparc-sun-solaris2.11 pr94703.c.021t.ssa > > The new testcase FAILs on sparc-sun-solaris2.11 (both 32 and 64-bit): > > +FAIL: gcc.dg/tree-ssa/pr94703.c scan-tree-dump ssa "No longer having > address taken: r" Hmm, OK looks like memcpy is not folded, likely because the source is not known to be appropriately aligned. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c b/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c index 7209fa0a4d4..eadea45a32f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c @@ -4,6 +4,7 @@ unsigned int set_lowpart (unsigned int const *X) { unsigned int r = 0; + X = __builtin_assume_aligned (X, sizeof (unsigned int) / 2); __builtin_memcpy(&r,X,sizeof (unsigned int) / 2); return r; } should fix this. Can you verify and if so, commit? Thx.
[Bug tree-optimization/95001] std::terminate() and abort() do not have __builtin_unreachable() semantics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95001 --- Comment #1 from Richard Biener --- Sorry, but noreturn functions can have side-effects that need to be preserved.
[Bug bootstrap/94998] GCC 10 won't configure for host=x86, build!=host, linker=bfd due to CET
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94998 Richard Biener changed: What|Removed |Added Status|WAITING |NEW Component|target |bootstrap Host||x86_64-linux --- Comment #2 from Richard Biener --- Ugh.
[Bug middle-end/94994] [10/11 Regression] possible miscompilation of word-at-a-time copy via packed structs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994 Richard Biener changed: What|Removed |Added Last reconfirmed||2020-05-08 Target Milestone|--- |10.2 Status|UNCONFIRMED |NEW Priority|P3 |P2 Keywords||wrong-code Ever confirmed|0 |1 --- Comment #2 from Richard Biener --- Confirmed.
[Bug middle-end/95021] [10/11 Regression] Bogus -Wclobbered warning
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95021 Richard Biener changed: What|Removed |Added Keywords||diagnostic CC|rguenther at suse dot de |law at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Target||x86_64-*-* --- Comment #3 from Richard Biener --- IIRC Jeff was working on replacing -Wclobbered
[Bug target/95023] Offloading AMD GCN wiki cannot be followed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95023 Richard Biener changed: What|Removed |Added Target||gcn Keywords||documentation --- Comment #1 from Richard Biener --- It's upstream newlib, https://sourceware.org/newlib/
[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED
[Bug sanitizer/95033] [11 Regression] ICE in set_parm_rtl, at cfgexpand.c:1310 since r11-165-geb72dc663e9070b2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95033 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/95045] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed||2020-05-11 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Mine.
[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025 --- Comment #2 from Richard Biener --- (In reply to David Binderman from comment #1) > I see this bug also. Another C test case is available on request. Please attach it.
[Bug tree-optimization/95045] [11 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045 --- Comment #2 from Richard Biener --- OK, this one is an interesting one (might be also latent before the rewrite). I'll deal with it separately. The issue is around the inner loop having multiple exits, one being also the exit from the outer loop and edge inserts on that edge getting mis-ordered (we commit them only after processing all inserts).
[Bug tree-optimization/95049] GCC never terminates with trivial input program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95049 Richard Biener changed: What|Removed |Added Component|c |tree-optimization Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2020-05-11 Status|UNCONFIRMED |ASSIGNED --- Comment #1 from Richard Biener --- Mine.
[Bug c/95052] Excess padding of partially initialized strings/char arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95052 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-11 Keywords||missed-optimization --- Comment #1 from Richard Biener --- I'm not sure what you describe as padding is padding. Instead it's valid to access all elements of the array you declare and thus it must be initialized. What could be done is elide zero-padding parts to a memset() call.
[Bug tree-optimization/95051] error: invalid RHS for gimple memory store:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95051 Richard Biener changed: What|Removed |Added Component|c |tree-optimization Version|unknown |11.0 CC||marxin at gcc dot gnu.org, ||rguenth at gcc dot gnu.org Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Depends on||95033 Last reconfirmed||2020-05-11 --- Comment #3 from Richard Biener --- Confirmed, looks related to PR95033 The ICE occurs in sanopt Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95033 [Bug 95033] [11 Regression] ICE in set_parm_rtl, at cfgexpand.c:1310 since r11-165-geb72dc663e9070b2
[Bug tree-optimization/95049] [9/10/11 Regression] GCC never terminates with trivial input program
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95049 Richard Biener changed: What|Removed |Added Summary|GCC never terminates with |[9/10/11 Regression] GCC |trivial input program |never terminates with ||trivial input program Target Milestone|--- |9.4 Priority|P3 |P2
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Bug 57359 depends on bug 90668, which changed state. Bug 90668 Summary: loop invariant moving a dependent store out of a loop https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90668 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE
[Bug tree-optimization/90668] loop invariant moving a dependent store out of a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90668 Richard Biener changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE --- Comment #5 from Richard Biener --- Dup. *** This bug has been marked as a duplicate of bug 57359 ***
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Richard Biener changed: What|Removed |Added CC||msebor at gcc dot gnu.org --- Comment #34 from Richard Biener --- *** Bug 90668 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/95056] [11 Regression] slp-perm-9.c fails on aarch64 after gbc484e250990393e887f7239157cc85ce6fadcce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95056 Richard Biener changed: What|Removed |Added Version|10.0|11.0 Component|target |tree-optimization Target Milestone|--- |11.0 Keywords||missed-optimization Summary|slp-perm-9.c fails on |[11 Regression] |aarch64 after |slp-perm-9.c fails on |gbc484e250990393e887f723915 |aarch64 after |7cc85ce6fadcce |gbc484e250990393e887f723915 ||7cc85ce6fadcce --- Comment #1 from Richard Biener --- Hmm, load-lane support should be unaffected (but I didn't test obviously). I hope aarch64 folks can investigate - eventually the permute check done in vectorizable_load needs adjustment / moving.
[Bug target/95055] [11 Regression] gcc.dg/compat/scalar-by-value-3 fails on aarch64 after r11-165-geb72dc663e9070b281be83a80f6f838a3a878822
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95055 Richard Biener changed: What|Removed |Added Summary|gcc.dg/compat/scalar-by-val |[11 Regression] |ue-3 fails on aarch64 after |gcc.dg/compat/scalar-by-val |r11-165-geb72dc663e9070b281 |ue-3 fails on aarch64 after |be83a80f6f838a3a878822 |r11-165-geb72dc663e9070b281 ||be83a80f6f838a3a878822 Target Milestone|--- |11.0 Version|10.0|11.0 CC||rguenth at gcc dot gnu.org Keywords||wrong-code
[Bug fortran/95053] [11.0 regression] ICE in f951: gfc_divide()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/95058] [11 regression] vector test case failures starting with r11-205
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95058 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0 Component|other |tree-optimization --- Comment #1 from Richard Biener --- Can you attach the dumps for power7 and "the rest"?
[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Richard Biener --- Fixed.
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Bug 57359 depends on bug 95025, which changed state. Bug 95025 Summary: [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Richard Biener --- Fixed.
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Bug 57359 depends on bug 94988, which changed state. Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/95045] [11 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #4 from Richard Biener --- Fixed.
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Bug 57359 depends on bug 95045, which changed state. Bug 95045 Summary: [11 Regression] wrong code at -O3 on x86_64-linux-gnu https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug libgomp/95062] [10/11 Regression] libgomp.oacc-c-c++-common/pr92843-1.c:26: verify_array: Assertion `array[i] == value' failed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95062 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.2
[Bug c++/95063] [11 Regression] ICE in tsubst_decl, at cp/pt.c:14633
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95063 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug tree-optimization/95060] vfnmsub132ps is not generated with -ffast-math
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95060 Richard Biener changed: What|Removed |Added Version|unknown |11.0 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-12 Keywords||missed-optimization Ever confirmed|0 |1 Target||x86_64-*-* i?86-*-* --- Comment #3 from Richard Biener --- FMA generation already folds the FMA stmt: if (cond) fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, op2, addop, else_value); else fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt)); gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun, use_stmt)); gsi_replace (&gsi, fma_stmt, true); /* Follow all SSA edges so that we generate FMS, FNMA and FNMS regardless of where the negation occurs. */ gimple *orig_stmt = gsi_stmt (gsi); if (fold_stmt (&gsi, follow_all_ssa_edges)) { if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) gcc_unreachable (); update_stmt (gsi_stmt (gsi)); but not the negate it feeds since with -ffast-math we have -((a[i] * b[i]) + c[i]) as canonical form it seems (reassoc does this). float r[8], a[8], b[8], c[8]; void test_fnms (void) { for (int i = 0; i < 8; i++) r[i] = -((a[i] * b[i]) + c[i]); } would be an alternative testcase, not handled without -ffast-math either. I'd suggest to fold the single-use stmt of the fma_stmts lhs if any [and if it is a negate].
[Bug fortran/95067] [9/10/11 Regression] ICE in tree_fits_shwi_p, at tree.c:7262
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95067 Richard Biener changed: What|Removed |Added Target Milestone|--- |9.4 --- Comment #2 from Richard Biener --- That commit looks totally unrelated ... but it's eventually that /* If there was an input error and we don't really have a type, avoid crashing and write something that is at least valid by assuming `int'. */ if (type == error_mark_node) type = integer_type_node; in dbxout_type makes us later use uninitialized low/high. using void_type_node might be less error-prone here. Untested suggestion, that is. Take it or leave it ;) (stabs should go away)
[Bug middle-end/95072] [10/11 Regression] -Warray-bounds false positive with flexible array bounds (regression from GCC 9)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95072 Richard Biener changed: What|Removed |Added Summary|-Warray-bounds false|[10/11 Regression] |positive with flexible |-Warray-bounds false |array bounds (regression|positive with flexible |from GCC 9) |array bounds (regression ||from GCC 9) Priority|P3 |P2 Target Milestone|--- |10.2
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #19 from Richard Biener --- Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?). Note we see Estimating sizes for loop 3 BB: 14, after_exit: 0 size: 1 _20 = count[n_95]; size: 1 _21 = _20 + 1; size: 1 count[n_95] = _21; size: 1 _22 = stride[n_95]; size: 0 _23 = (long unsigned int) _22; size: 1 _44 = _23 - _82; size: 1 _45 = _44 * 4; size: 1 src_62 = src_85 + _45; size: 1 _25 = extent[n_95]; size: 2 if (_21 == _25) BB: 20, after_exit: 1 BB: 13, after_exit: 0 size: 1 count[n_95] = 0; size: 1 _18 = _22 * _25; size: 0 _19 = (long unsigned int) _18; size: 1 n_60 = n_95 + 1; Induction variable computation will be folded away. size: 2 if (dim_43 == n_60) Exit condition will be eliminated in last copy. size: 15-1, last_iteration: 15-3 Loop size: 15 Estimated size after unrolling: 129 Making edge 13->20 impossible by redistributing probability to other edges. ../../../trunk/libgfortran/generated/in_pack_i4.c:100:14: optimized: loop with 13 iterations completely unrolled (header execution count 23565294) Last iteration exit edge was proved true. Note even with the rs6000 limits turned back to default I see the loop unrolled (with -O3 or -O2 -funroll-loops). Checking on x86_64 the file is compiled with -O2 only and we have size: 17-1, last_iteration: 10-3 Loop size: 17 Estimated size after unrolling: 154 Not unrolling loop 3: size would grow. so what's the speciality on POWER? Code growth should trigger with -O3 only. Given we have only a guessed profile (and that does not detect the inner loop as completely cold) we're allowing growth then. GCC has no idea the outer loop iterates more than the inner. Note re-structuring the loop to use down-counting count[] from extent[] to zero would be worth experimenting with, likewise "peeling" the dim == 0 loop and not making the outermost loop key on 'src' (can 'src' be NULL on entry?). Anyway, completely peeling this loop looks useless - the only benefit might be better branch prediction (each dimension gets its own entry in the predictor cache). If POWER cannot cope with large loops then I wonder why POWER people increased limits (though even the default limits would unroll the loop). Thomas - where did you measure the slowness? For which dimensionality? I'm quite sure the loop structure will be sub-optimal for certain input shapes... (stride0 == 1 could even use memcpy for the inner dimension).
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #20 from Richard Biener --- (In reply to Jiu Fu Guo from comment #18) > Currently, I'm thinking to enhance GCC 'cunroll' as: > if the loop has multi-exits or upbound is not a fixed number, we may not do > 'complete unroll' for the loop, except -funroll-all-loops is specified. That doens't make much sense (-funroll-all-loops is RTL unroller only). I think the growth limits are simply too large unless we compute a "win" which we in this case do not. So I'd say the growth limits should scale with win ^ (1/new param) thus if we estimate to eliminate 20% of the loop stmts due to unrolling then the limit to apply is limit * (0.2 ^ (1/X)) with X maybe defaulting to 2. I'd only apply this new limit for peeling (peeling is when the loop count is not constant and thus we keep the exit tests). Of course people want more peeling (hello POWER people!)
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #23 from Richard Biener --- (In reply to Richard Biener from comment #20) > (In reply to Jiu Fu Guo from comment #18) > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > if the loop has multi-exits or upbound is not a fixed number, we may not do > > 'complete unroll' for the loop, except -funroll-all-loops is specified. > > That doens't make much sense (-funroll-all-loops is RTL unroller only). > > I think the growth limits are simply too large unless we compute a "win" > which we in this case do not. So I'd say the growth limits should scale > with win ^ (1/new param) thus if we estimate to eliminate 20% of the > loop stmts due to unrolling then the limit to apply is > limit * (0.2 ^ (1/X)) with X maybe defaulting to 2. > > I'd only apply this new limit for peeling (peeling is when the loop count > is not constant and thus we keep the exit tests). > > Of course people want more peeling (hello POWER people!) Btw, the issue with the rs6000 code at present is that it uses unroll_only_small_loops but that only affects the RTL unroller while the enablement of -funroll-loops at -O2 affects GIMPLE as well but unconstrained (with -O3 params). For the main unroll pass (not cunrolli) this triggers code size growth: unsigned int val = tree_unroll_loops_completely (flag_unroll_loops || flag_peel_loops || optimize >= 3, true); the "original" patch also adjusted parameters. If the intent is to only affect the RTL unroller then we need a separate flag controlling it (yeah, using the same flags as heuristic trigger was probably bad).
[Bug debug/95080] [10/11 Regression] -fcompare-debug failure (length) with -Og -fcse-follow-jumps -fnon-call-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95080 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.2
[Bug target/95078] Missing fwprop for SIB address
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95078 --- Comment #1 from Richard Biener --- TER should go away, not be extended. So you are suggesting that we replace leaq44(%rdi,%rdx,4), %rdx --- redundant could be fwprop movl(%rdx), %eax movl$3, (%rsi) addl(%rdx), %eax with movl 44(%rdi,%rdx,4), %eax movl$3, (%rsi) addl 44(%rdi,%rdx,4), %eax ? The variant that looks bigger is actually one byte smaller. Note as soon as there are three uses it will be larger again... So this is really something for RTL and yeah, fwprop only makes "local" decisions. Note that I think that your proposed variant will consume more resources since the complex addressing modes are likely split into a separate uop. Yes, overall I'd expect less latency for your sequence.
[Bug debug/95077] Wrong backtrace infromation at O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95077 Richard Biener changed: What|Removed |Added Known to fail||9.3.1 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-12 --- Comment #1 from Richard Biener --- Confirmed.
[Bug target/95076] Failure to tail-call on function call of different return type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95076 Richard Biener changed: What|Removed |Added Summary|Failure to optimize out |Failure to tail-call on |stack alignment on function |function call of different |call of different type on |return type |x86 | CC||hjl.tools at gmail dot com Target||x86_64-*-* i?86-*-* --- Comment #1 from Richard Biener --- GCC doesn't tail-call because the return types are not compatible. With a call it cannot optimize the stack adjustment because of the ABI. Note I'm not sure whether the ABI allows %rax to contain "garbage" in the upper half for a function returning in %eax. So what LLVM does may be wrong.
[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359 Bug 57359 depends on bug 94988, which changed state. Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED
[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988 Richard Biener changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Richard Biener --- Fixed.
[Bug tree-optimization/95058] [11 regression] vector test case failures starting with r11-205
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95058 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Last reconfirmed||2020-05-12 --- Comment #6 from Richard Biener --- OK, so for non 7 BE we end up not vectorizing because it doesn't look profitable which IMHO is good. It would be nice to also see dumps before the respective rev. because in theory (well...) the cost computation should be the same. Ah! OK, so we now have 0x10002001470 _1 1 times vec_construct costs 2 in prologue 0x10002001470 _1 1 times vec_construct costs 2 in prologue 0x10002001470 _1 2 times vector_store costs 2 in body 0x10001ecfcc0 _1 1 times scalar_store costs 1 in body 0x10001ecfcc0 _2 1 times scalar_store costs 1 in body 0x10001ecfcc0 _3 1 times scalar_store costs 1 in body 0x10001ecfcc0 _4 1 times scalar_store costs 1 in body that is, the SLP graph has the expected cost. Originally we likely had costed against 4 scalar stores and 4 scalar loads (but the scalar loads will still be there). On x86_64 we get 0x3975280 _1 1 times vec_construct costs 8 in prologue 0x3975280 _1 1 times vec_construct costs 8 in prologue 0x3975280 _1 2 times vector_store costs 24 in body 0x3942cb0 _1 1 times scalar_store costs 12 in body 0x3942cb0 _2 1 times scalar_store costs 12 in body 0x3942cb0 _3 1 times scalar_store costs 12 in body 0x3942cb0 _4 1 times scalar_store costs 12 in body so it's still profitable there. Note I suggest to leave the FAILs in place for now since in my dev tree I see the vec_construct gone again so it would start passing again on ppc as well. Sorry for the intermediate breakage.
[Bug target/95083] New: x86 fp_movcc expansion depends on real_cst sharing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95083 Bug ID: 95083 Summary: x86 fp_movcc expansion depends on real_cst sharing Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- I see gcc.target/i386/avxfp-1.c FAILing, which is double x; void t() { x=x>5?x:5; } double x; void q() { x=x<5?x:5; } and q() recognized as FP min by ix86_expand_fp_movcc because the doesn't pass prepare_cmp_insn () and later ifcvt matches up the originally distinct pseudos for the two mentions of '5'. For t() prepare_cmp_insn () succeeeds and ix86_expand_fp_movcc expands this to a UNSPEC_BLEND (because the two mentions of '5' get a different pseudo so this doesn't look like a max). The first prepare_cmp_insn fails because it is fed (lt (reg:DF 82 [ x.3_1 ]) (const_double:DF 5.0e+0 [0x0.ap+3])) and appearantly we cannot do a lt compare(?) (but later during ifcvt we can). Note the above is when expanding from a COND_EXPR, thus t () { double x.1_1; double iftmp.0_3; ;; basic block 2, loop depth 0 ;;pred: ENTRY x.1_1 = x; iftmp.0_3 = x.1_1 > 5.0e+0 ? x.1_1 : 5.0e+0; x = iftmp.0_3; return; and q () { double x.3_1; double iftmp.2_3; ;; basic block 2, loop depth 0 ;;pred: ENTRY x.3_1 = x; iftmp.2_3 = x.3_1 < 5.0e+0 ? x.3_1 : 5.0e+0; x = iftmp.2_3; return; similar FAILs occur for FAIL: gcc.target/i386/avxfp-1.c scan-assembler vmaxsd FAIL: gcc.target/i386/avxfp-2.c scan-assembler vminsd FAIL: gcc.target/i386/ssefp-1.c scan-assembler maxsd FAIL: gcc.target/i386/ssefp-2.c scan-assembler minsd So what's missing is simplification of Trying 8 -> 9: 8: r87:DF=r85:DF
[Bug target/95083] x86 fp_movcc expansion depends on real_cst sharing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95083 Richard Biener changed: What|Removed |Added Version|10.0|11.0 Keywords||missed-optimization CC||uros at gcc dot gnu.org Target||x86_64-*-* i?86-*-* --- Comment #1 from Richard Biener --- Needs https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545588.html to reproduce.
[Bug tree-optimization/95084] New: code sinking prevents if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084 Bug ID: 95084 Summary: code sinking prevents if-conversion Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- There's a pass ordering issue between the sink pass and tree-if-conv, if conversion for vectorization. When sink sinks a possibly trapping operation to a place that is only conditionally executed if-conversion fails which results in failed vectorization. This can be seen with https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545588.html applied for gcc.dg/vect/pr56541.c (and it's ifcvt counterpart gcc.dg/tree-ssa/ifc-pr56541.c). But I've also seen this in other context. Here iftmp.2_17 = rR_19 < rL_20 ? rR_19 : rL_20; iftmp.3_3 = rR_19 < rL_20 ? rL_20 : rR_19; if (iftmp.3_3 > 0.0) goto ; [INV] else goto ; [INV] : : # iftmp.4_14 = PHI if (iftmp.4_14 > 0.0) becomes iftmp.3_3 = rR_17 < rL_18 ? rL_18 : rR_17; if (iftmp.3_3 > 0.0) goto ; [59.00%] else goto ; [41.00%] [local count: 435831803]: goto ; [100.00%] [local count: 627172605]: iftmp.2_15 = rR_17 < rL_18 ? rR_17 : rL_18; if (iftmp.2_15 > 0.0) and the now conditionally executed FP comparison can trap.
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #28 from Richard Biener --- > It the growth limit seems could be refined. The ^ is an exponent operation, > right? Yes. The idea is to limit growth more when there is no benefit of unrolling detected by the cost model (which currently simply counts likely eliminated stmts). (In reply to Jiu Fu Guo from comment #27) > (In reply to Jiu Fu Guo from comment #26) > > (In reply to Richard Biener from comment #20) > > > (In reply to Jiu Fu Guo from comment #18) > > > > Currently, I'm thinking to enhance GCC 'cunroll' as: > > > > if the loop has multi-exits or upbound is not a fixed number, we may > > > > not do > > > > 'complete unroll' for the loop, except -funroll-all-loops is specified. > > > > > > That doens't make much sense (-funroll-all-loops is RTL unroller only). > > > > For the loop which has multi-exits, it may not helpful to unroll it, > especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c. > Since it would early exit, some iterations(may most iterations) were not > executed. > > Is it a good idea to disable the GIMPLE cunroll for this kind of loop? RTL > unroll_stupid does not unroll this kind of loop either. Well, GIMPLE cunroll specifically handles the situation of peeling such loops and has a separate --param to control how many extra branches it may introduce for those exits. Generally disabling unrolling of such loops isn't a good idea, the reason for completely unrolling loops is abstraction removal and not necessarily producing more optimal loop kernels (the loop is gone afterwards). One of my TODO items is to work on its costing model to the extent that we run value-numbering on the unrolled body (that's already done) and roll back the unrolling if there wasn't any visible benefit. The difficult cases are like those in SPEC calculix where for full benefit you need to unroll the 5(!) innermost loops and to even see any benefit you need to unroll the 3 innermost loops.
[Bug tree-optimization/95097] Missed optimization with bitfield value ranges
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097 --- Comment #3 from Richard Biener --- Just to quote EVRP sees : _1 = VIEW_CONVERT_EXPR(f); _2 = _1 & 1048575; if (_2 != 0) goto ; [INV] else goto ; [INV] : _3 = f.x; _4 = (unsigned int) _3; y_8 = _4 * 4096; if (y_8 <= 199) thus the f.x != 0 test has been folded by one of those $?%&! permature fold-const transforms to if ((BIT_FIELD_REF & 1048575) != 0) the fix is to get rid of those (and fix the "fallout").
[Bug debug/95098] Out of scope variable visible during debugging at Og
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95098 Richard Biener changed: What|Removed |Added CC||aoliva at gcc dot gnu.org, ||edlinger at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- Don't see this with gdb: (gdb) start The program being debugged has been started already. Start it from the beginning? (y or n) y Temporary breakpoint 4 at 0x4004bd: file z.c, line 11. Starting program: /home/rguenther/obj/gcc/a.out Temporary breakpoint 4, main () at z.c:11 11 int main() { b(); } (gdb) s Breakpoint 3, b () at z.c:4 4 for (g_2 = 21; (g_2 < (-27)); g_2 = 0) (gdb) p l_9 No symbol "l_9" in current context. (gdb) info locals l_10 = note there _is_ l_9 in the DWARF, even with a location: <2>: Abbrev Number: 8 (DW_TAG_lexical_block) DW_AT_low_pc : 0xa DW_AT_high_pc : 0x0 <3>: Abbrev Number: 9 (DW_TAG_variable) DW_AT_name: l_9 DW_AT_decl_file : 1 DW_AT_decl_line : 7 DW_AT_decl_column : 7 DW_AT_type: <0xeb> DW_AT_location: 10 byte block: 3 0 0 0 0 0 0 0 0 9f (DW_OP_addr: 0; DW_OP_stack_value) but : 0: c7 05 00 00 00 00 15movl $0x15,0x0(%rip)# a 7: 00 00 00 a: c3 retq and certainly the DW_AT_high_pc of the lexical block looks "odd" - the block is not existent. Assembly: b: .LFB0: .file 1 "z.c" .loc 1 2 9 view -0 .cfi_startproc .loc 1 3 5 view .LVU1 .loc 1 4 5 view .LVU2 .loc 1 4 14 is_stmt 0 view .LVU3 movl$21, g_2(%rip) .loc 1 4 20 is_stmt 1 view .LVU4 .LBB2: .loc 1 7 2 view .LVU5 .LVL0: .loc 1 8 2 view .LVU6 .LBE2: .loc 1 10 1 is_stmt 0 view .LVU7 ret so you can see .LBB2 to .LBE2 do not contain any actual instructions. GIMPLE we expand from: b () { [local count: 1073741824]: [z.c:3:5] # DEBUG BEGIN_STMT [z.c:4:5] # DEBUG BEGIN_STMT [z.c:4:14] g_2 = 21; [z.c:4:20] # DEBUG BEGIN_STMT [z.c:7:2] # DEBUG BEGIN_STMT [z.c:7:7] # DEBUG l_9 => [z.c:7:13] &a [z.c:8:2] # DEBUG BEGIN_STMT [z.c:8:2] return; does lldb try to interpret location views yet? I suppose it might get confused about the is_stmt 0 on the movl and only stop at ret even though the "last" location on that is line 10 (but is_stmt 0 again). It's difficult to produce a meaningful line-number program for the resulting assembler ;)
[Bug tree-optimization/92177] [10 Regression] gcc.dg/vect/bb-slp-22.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92177 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #12 from Richard Biener --- .
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #30 from Richard Biener --- (In reply to Thomas Koenig from comment #29) > It is also interesting that this variant > > --- a/libgfortran/generated/in_pack_i4.c > +++ b/libgfortran/generated/in_pack_i4.c > @@ -88,7 +88,7 @@ internal_pack_4 (gfc_array_i4 * source) >count[0]++; >/* Advance to the next source element. */ >index_type n = 0; > - while (count[n] == extent[n]) > + while (n < dim && count[n] == extent[n]) > { >/* When we get to the end of a dimension, reset it and increment > the next dimension. */ > @@ -100,7 +100,6 @@ internal_pack_4 (gfc_array_i4 * source) >if (n == dim) > { >src = NULL; > - break; > } >else > { > > does not get peeled. More optimal would be count[0]--; >/* Advance to the next source element. */ >index_type n = 0; while (count[n] == 0) { ... } note completely peeling the inner loop isn't as bad as it looks, it's basically making the whole loop while (1) { for (count[0] = 0; count[0] < extent[0]; ++count[0]) { /* Copy the data. */ *(dest++) = *src; /* Advance to the next element. */ src += stride0; } if (dim == 1) break; count[0] = 0; src -= stride[0] * extent[0]; count[1]++; if (count[1] != extent[1]) continue; if (dim == 2) break; count[1] = 0; src -= stride[1] * extent[1]; count[2]++; if (count[2] != extent[2]) continue; if (dim == 3) break; ... } which should be quite optimal for speed (branch-prediction wise). One might want to try to optimize code size a bit, sure. Sacrifying a bit of speed at the loop exit could be setting extent[n > dim] = 1 and dropping the if (dim == N) break; checks, leaving just the last. Likewise changing the iteration from extent[N] to zero might make the tests smaller. Then as commented in the code pre-computing the products might help as well - you get one additional load of course. Interleaving extent and the product data arrays would help cache locality. Note writing the loop as above will make GCC recognize it as a loop nest.
[Bug rtl-optimization/95102] New: missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95102 Bug ID: 95102 Summary: missed if-conversion Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- If you rewrite gcc.target/i386/pr54855-9.c to a form GIMPLE looks like after some PRE you end up with typedef float vec __attribute__((vector_size(16))); vec foo (vec x, float a) { if (!(x[0] < a)) x[0] = a; return x; } which is no longer recognized as the same and emits foo: .LFB0: .cfi_startproc comiss %xmm0, %xmm1 ja .L2 movss %xmm1, %xmm0 .L2: ret instead of foo: .LFB1: .cfi_startproc minss %xmm1, %xmm0 ret this is because RTL if-conversion does not recognize 7: r86:SF=vec_select(r84:V4SF,parallel) 8: flags:CCFP=cmp(r85:SF,r86:SF) REG_DEAD r86:SF 9: pc={(flags:CCFP>0)?L14:pc} REG_DEAD flags:CCFP REG_BR_PROB 536870916 10: NOTE_INSN_BASIC_BLOCK 3 12: r84:V4SF=vec_merge(vec_duplicate(r85:SF),r84:V4SF,0x1) REG_DEAD r85:SF 14: L14: 15: NOTE_INSN_BASIC_BLOCK 4 20: xmm0:V4SF=r84:V4SF the form it does recognize is 8: r82:SF=vec_select(r84:V4SF,parallel) 9: flags:CCFP=cmp(r85:SF,r82:SF) 10: pc={(flags:CCFP>0)?L28:pc} REG_DEAD flags:CCFP REG_BR_PROB 536870916 28: L28: 14: NOTE_INSN_BASIC_BLOCK 3 5: r85:SF=r82:SF REG_DEAD r82:SF 15: L15: 16: NOTE_INSN_BASIC_BLOCK 4 18: r87:V4SF=vec_merge(vec_duplicate(r85:SF),r84:V4SF,0x1)
[Bug rtl-optimization/95102] missed if-conversion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95102 --- Comment #1 from Richard Biener --- OK, so one reason is that if (!can_conditionally_move_p (x_mode)) return FALSE; returns false for E_V4SFmode on x86. min/max detection is based on fp_cmov expansion for scalar FP on x86 though (with its own problems, see PR95083).
[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018 --- Comment #32 from Richard Biener --- Note I don't think the unrolling is excessive - store motion then applying to all count[] and all computations hoisted out of the loop may be a bit too much for register pressure though, especially since we're using flag-based store-motion. But it causes the stores to be materialized on all exits of the loop which means we end up with N*N conditional stores :/ I guess SM could be improved here.
[Bug c++/95103] Unexpected -Wclobbered in bits/vector.tcc with -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95103 Richard Biener changed: What|Removed |Added Version|unknown |10.1.0 Keywords||diagnostic --- Comment #1 from Richard Biener --- Likely because of the std::vector DTOR invocation which has to access 'v' which is not declared volatile but still "live" across the setjmp. Does it work placing the initial part of the function in a separate { }?
[Bug testsuite/95110] new test case in r11-345 error: gcc.dg/tree-ssa/pr94969.c: dump file does not exist
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95110 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Richard Biener --- Fixed.
[Bug fortran/95109] [11 regression] ICE in gfortran.dg/gomp/target1.f90 after r11-349
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95109 Richard Biener changed: What|Removed |Added Target Milestone|--- |11.0
[Bug target/95112] i686 procedures have prolog endbr32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95112 --- Comment #1 from Richard Biener --- Try -fcf-protection=none
[Bug tree-optimization/95113] [10/11 Regression] Wrong code w/ -O2 -fexceptions -fnon-call-exceptions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95113 Richard Biener changed: What|Removed |Added Blocks||93385 Priority|P3 |P2 Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93385 [Bug 93385] [10/11 Regression] wrong code with u128 modulo at -O2 -fno-dce -fno-ipa-cp -fno-tree-dce
[Bug middle-end/95108] [9/10/11 Regression] ICE in tree_fits_uhwi_p, at tree.c:7292
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95108 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug fortran/95107] [10/11 Regression] ICE in hash_operand, at fold-const.c:3768
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95107 Richard Biener changed: What|Removed |Added CC||marxin at gcc dot gnu.org Priority|P3 |P2 Target Milestone|--- |10.2
[Bug middle-end/95115] RISC-V 64: inf/inf division optimized out, invalid operation not raised
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 Richard Biener changed: What|Removed |Added Keywords||wrong-code Target|riscv64-unknown-linux-gnu | Summary|[10 Regression] RISC-V 64: |RISC-V 64: inf/inf division |inf/inf division optimized |optimized out, invalid |out, invalid operation not |operation not raised |raised | Last reconfirmed||2020-05-14 Ever confirmed|0 |1 Component|target |middle-end Build|riscv64-unknown-linux-gnu | Host|riscv64-unknown-linux-gnu | Status|UNCONFIRMED |NEW --- Comment #6 from Richard Biener --- (simplify (rdiv @0 @0) (if (FLOAT_TYPE_P (type) && ! HONOR_NANS (type) && ! HONOR_INFINITIES (type)) { build_one_cst (type); })) so that's not it, possibly constant folding instead in const_binop. There we only have 1276 /* Don't perform operation if we honor signaling NaNs and 1277 either operand is a signaling NaN. */ 1278 if (HONOR_SNANS (mode) 1279 && (REAL_VALUE_ISSIGNALING_NAN (d1) 1280 || REAL_VALUE_ISSIGNALING_NAN (d2))) 1281return NULL_TREE; and 1283 /* Don't perform operation if it would raise a division (gdb) 1284 by zero exception. */ 1285 if (code == RDIV_EXPR 1286 && real_equal (&d2, &dconst0) 1287 && (flag_trapping_math || ! MODE_HAS_INFINITIES (mode))) 1288return NULL_TREE; which both don't trigger. Afterwards 1309 inexact = real_arithmetic (&value, code, &d1, &d2); even returns false and the result is a qNaN. For the specific regression in this bug we now simply are able to turn return u.x/v.x; into a division of two constants. That's nothing we're going to "fix", so we have to fix the above instead which is a much older issue.
[Bug tree-optimization/95118] [10/11 Regression] gcc-10 and master -O3 -fopt-info-vec causes gcc to hang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95118 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Keywords||compile-time-hog Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Target Milestone|--- |10.2 Summary|gcc-10 and master -O3 |[10/11 Regression] gcc-10 |-fopt-info-vec causes gcc |and master -O3 |to hang |-fopt-info-vec causes gcc ||to hang Ever confirmed|0 |1 Last reconfirmed||2020-05-14 Known to work||9.3.0 --- Comment #5 from Richard Biener --- On the GCC 10 branch I see it not returning from (gdb) fin Run till exit from #0 0x0107148d in real_to_decimal_for_mode ( str=0x7fffcd60 "\200", r_orig=0x7fffcd40, buf_size=100, digits=57, crop_trailing_zeros=1, mode=E_VOIDmode) at /space/rguenther/src/gcc-10-branch/gcc/real.c:1718 we're in this loop: while (1) { /* Stop if we'd shift bits off the bottom. */ if (v.sig[0] & 7) break; do_multiply (&u, &v, ten); /* Stop if we're now >= 1. */ if (REAL_EXP (&u) > 0) break; v = u; dec_exp -= 1; } (gdb) p u $1 = {cl = 0, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 0, sig = {0, 0, 0}} and the original REAL_VALUE_TYPE is (gdb) p *r_orig $4 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 67092486, sig = {0, 0, 0}} so it's simply a weird not normalized constant zero ... I have a patch to paper over this in real_to_decimal_for_mode which then prints a (0.0e-8191); for the specific number.
[Bug bootstrap/95122] Cross-compile arm32 toolchain with hard float, but Error in gcc final
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95122 Richard Biener changed: What|Removed |Added Target||arm-linux-gnueabihf Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed||2020-05-14 --- Comment #1 from Richard Biener --- You seem to build from inside the source directory, that is not supported. Please create a separate object directory like mkdir obj cd obj ../configure and re-try.
[Bug tree-optimization/95118] [10 Regression] gcc-10 and master -O3 -fopt-info-vec causes gcc to hang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95118 Richard Biener changed: What|Removed |Added Known to work||11.0 Summary|[10/11 Regression] gcc-10 |[10 Regression] gcc-10 and |and master -O3 |master -O3 -fopt-info-vec |-fopt-info-vec causes gcc |causes gcc to hang |to hang | Known to fail||10.1.0 --- Comment #7 from Richard Biener --- Fixed on trunk sofar.
[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703 --- Comment #12 from Richard Biener --- (In reply to pskocik from comment #11) > Thanks for the shot at a fix, Richard Biener. > > Since I have reported this, I think I should mentioned a related > suboptimality that should probably be getting fixed alongside with this (if > this one is getting fixed), namely that while > > > int64_t zextend_int_to_int64_nospill(int *X) > { > union { int64_t _; } r = {0}; return memcpy(&r._,X,sizeof(*X)),r._; > } > > (and hopefully later even > > int64_t zextend_int_to_int64_spill(int *X) { int64_t r = {0}; return > memcpy(&r,X,sizeof(*X)),r; } > ) > > generates, on x86_64, the optimal > > zextend_int_to_int64_nospill: > mov eax, DWORD PTR [rdi] > ret > > for zeroextending promotions of sub-int types, an extra xor instruction gets > generated, e.g.: > > > int64_t zextend_short_to_int64_nospill_but_suboptimal(short *X) > { > union { int64_t _; } r ={0}; return memcpy(&r._,X,sizeof(*X)),r._; > } > > => > > zextend_short_to_int64_nospill_but_suboptimal: > xor eax, eax > mov ax, WORD PTR [rdi] > ret > > which was surprising to me because it doesn't happen with zero-extending > memcpy-based promotion from {,u}ints to larger types ({,u}{,l}longs). > > https://gcc.godbolt.org/z/ZjXaCw I think this is PR93507 for which I have a patch queued as well.
[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703 --- Comment #13 from Richard Biener --- (In reply to r...@cebitec.uni-bielefeld.de from comment #10) > > --- Comment #9 from Richard Biener --- > [...] > > Hmm, OK looks like memcpy is not folded, likely because the source is > > not known to be appropriately aligned. > [...] > > should fix this. Can you verify and if so, commit? Thx. > > Unfortunately, it doesn't. OK, this only helps a bit later since CCP is required to propagate the alignment, the following forwprop pass to elide the memcpy and then finally the update-address-taken invocation in the _second_ CCP pass after inlining will have pr94703.c.093t.ccp2:No longer having address taken: r I've long pondered to remove the memcpy folding restriction for strict-align targets but never went through. I'll update the testcase to require /* { dg-require-effective-target non_strict_align } */
[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703 Richard Biener changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #15 from Richard Biener --- Fixed.
[Bug target/94087] std::random_device often fails when used from multiple threads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2020-05-14 --- Comment #10 from Richard Biener --- So it looks like the rdseed usage is new in GCC 10 libstdc++ and it prevails over the previous rdrand support if supported on your CPU. I can reproduce this on a CPU with rdseed support and libstdc++ from GCC 10. The code invoked looks correct to me: 20: 83 e8 01sub$0x1,%eax 23: 74 12 je 37 <_ZNSt12_GLOBAL__N_112__x86_rdseedEPv+ 0x37> 25: f3 90 pause 27: 0f c7 fardseed %edx 2a: 89 11 mov%edx,(%rcx) 2c: 73 f2 jae20 <_ZNSt12_GLOBAL__N_112__x86_rdseedEPv+ 0x20> the number of tries libstdc++ does is 100. Note rdrand doesn't exhibit this issue. So it might very well be a hardware limitation. Btw, the reproducer can be "enhanced" by providing the method of operation: std::random_device rd("rdseed"); that makes sure it will fail in a different way on a not capable CPU (Intel Broadwell or later or AMD Zen).
[Bug target/94087] std::random_device often fails when used from multiple threads
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087 Richard Biener changed: What|Removed |Added CC||hjl.tools at gmail dot com, ||redi at gcc dot gnu.org --- Comment #11 from Richard Biener --- HJ, is what libstdc++ does "unreasonable" (it uses rdseed by default if available) and could it do better? Can you reproduce the issue? The docs quoted by Andrew suggest that libstdc++ should, when retries are not enough, fall back to another method.
[Bug c/95126] Missed opportunity to turn static variables into immediates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95126 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Keywords||missed-optimization Ever confirmed|0 |1 Last reconfirmed||2020-05-14 --- Comment #1 from Richard Biener --- Confirmed. Only RTL expansion sees the aggregate copy involved with the function call and this, when folded from a constant initializer, is not subject to clever things such as merging of stores.
[Bug target/95125] Unoptimal code for vectorized conversions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125 Richard Biener changed: What|Removed |Added Version|unknown |11.0 Ever confirmed|0 |1 Last reconfirmed||2020-05-14 Target||x86_64-*-* i?86-*-* Keywords||missed-optimization Status|UNCONFIRMED |NEW --- Comment #1 from Richard Biener --- ISTR I filed a duplicate 10 years ago or so. The issue is the vectorizer could not handle V4DFmode -> V4SFmode conversions. Could, because for SVE we added the capability but this requires additional instruction patterns (IIRC I filed a but about this last year). Yep. PR92658 it is.
[Bug rtl-optimization/95123] [10/11 Regression] Wrong code w/ -O2 -fselective-scheduling2 -funroll-loops --param early-inlining-insns=5 --param loop-invariant-max-bbs-in-loop=3 --param max-jump-thread
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95123 Richard Biener changed: What|Removed |Added Target Milestone|--- |10.2
[Bug pch/95131] Instantiate templates at pch generation time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95131 Richard Biener changed: What|Removed |Added CC||nathan at gcc dot gnu.org --- Comment #1 from Richard Biener --- Modules are the future, not sure how this applies there.
[Bug rtl-optimization/11832] Optimization of common stores in switch statements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11832 Bug 11832 depends on bug 33315, which changed state. Bug 33315 Summary: stores not commoned by sinking https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/33315] stores not commoned by sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #16 from Richard Biener --- Fixed on trunk. Individual missed cases should be tracked by separate bugreports.
[Bug other/16996] [meta-bug] code size improvements
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16996 Bug 16996 depends on bug 33315, which changed state. Bug 33315 Summary: stores not commoned by sinking https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED