[Bug lto/115432] Building a program with -flto generates wrong code (missing the call to a function) unless -fno-strict-aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115432 Richard Biener changed: What|Removed |Added Resolution|--- |INVALID Status|UNCONFIRMED |RESOLVED --- Comment #2 from Richard Biener --- struct file_output_stream { union { void *voidp; int fd; } data; const output_stream_vtbl* vtbl; }; struct output_stream { void* data; const output_stream_vtbl* vtbl; }; those are two unrelated types. Doing ((file_output_stream *)p)->vtbl = x; ... = ((output_stream *)p)->vtbl; is invoking undefined behavior (unless -fno-strict-aliasing).
[Bug lto/115432] Building a program with -flto generates wrong code (missing the call to a function) unless -fno-strict-aliasing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115432 --- Comment #1 from Richard Biener --- In case output_stream is not the same or derived from file_output_stream or contains a file_output_stream object as first member you invoke undefined behavior when the calls following might read from the object via output_stream or another alltogether different type (buffer_output_stream?).
[Bug tree-optimization/115426] ICE: in execute_todo, at passes.cc:2138
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115426 --- Comment #3 from Richard Biener --- I think this is a gimplification failure. 'r' is neither TREE_ADDRESSABLE nor DECL_NOT_GIMPLE_REG and the =X constraint results in both allow_reg and allow_mem but we gimplify it as is_gimple_lvalue which should, as the base is a gimple register, emit a component extract to pre_p and a complex build to post_p. gimplify_compound_lval correctly sees this and forces a register argument to the __imag operation but I'm not sure that's enough for lvalues. IIRC a simple __imag x = 1; also doesn't have DECL_NOT_GIMPLE_REG on 'x', and gimplify_compound_lval behaves the same. Still we eventually gimplify to _1 = REALPART_EXPR ; x = COMPLEX_EXPR <_1, 1.0e+0>; D.2772 = x; which is done via gimplify_modify_expr_complex_part. That suggests it's gimplify_asm_expr that would need to do this very same thing as we seem to rely on this for correctness. With "=r" we correctly gimplify to __asm__("" : "=r" D.2772); _1 = REALPART_EXPR ; r = COMPLEX_EXPR <_1, D.2772>; D.2773 = r;
[Bug tree-optimization/115426] ICE: in execute_todo, at passes.cc:2138
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115426 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #2 from Richard Biener --- Mine (into-SSA is broken it seems)
[Bug tree-optimization/115423] Inlined strcmp optimizes poorly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115423 --- Comment #2 from Richard Biener --- You could also say rtl-optimization does a bad job with the inlined version. Or we should inline strcmp on GIMPLE to get the first char optimized. Consider strcmp (c, "ABCDEFGHabcdefgh") || strcmp (c, "ABCDEFGHfoobar") thus strings with a common prefix which we could optimize as strncmp (c, "ABCDEFGH", 8) && (strcmp (c+8, "abcdefgh") || strcmp (c+8, "foobar")) as a more general transform. I should say inline_string_cmp should consider using larger unaligned reads as well.
[Bug libstdc++/58909] C++11's condition variables fail with static linking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58909 Richard Biener changed: What|Removed |Added CC||ilg at livius dot net --- Comment #28 from Richard Biener --- *** Bug 115421 has been marked as a duplicate of this bug. ***
[Bug libstdc++/115421] Multi-threaded condition_variable app throws when linking as -static on Linux
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115421 Richard Biener changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |DUPLICATE --- Comment #16 from Richard Biener --- dup then *** This bug has been marked as a duplicate of bug 58909 ***
[Bug bootstrap/115416] [13/14/15 regression] Setting --includedir to a nonexistent directory causes a build error since r13-5490-g59e4c98173a79f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115416 Richard Biener changed: What|Removed |Added Version|unknown |14.1.0 Target Milestone|--- |13.4
[Bug tree-optimization/115427] fallback for interclass mathfn bifs like isinf, isfinite, isnormal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427 --- Comment #2 from Richard Biener --- The canonical way would be to handle these in the ISEL pass and remove the (fallback) expansion. But then we can see whether the expander FAILs (ideally expanders would never be allowed to FAIL, and for FAILing expanders we'd have a way to query the target like we have the vec_perm_const hook). But I'll note that currently the expanders may FAIL but then we expand to a call rather than the inline-expansion (and for example AVR relies on this now to avoid early folding of isnan). So - for the cases of isfininte and friends without a fallback call I would suggest to expand from ISEL to see if it FAILs and throw away the result (similar as how IVOPTs probes things). Or make those _not_ allowed to FAIL? Why would they fail to expand anyway?
[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Richard Biener --- Fixed. Unfortunately this didn't fix PR115256 if I checked correctly. Keep searching!
[Bug middle-end/115405] wrong code with _BitInt() sign-extension with -fno-strict-aliasing -O1 and above
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115405 --- Comment #3 from Richard Biener --- It's not visible but I assume that _4 doesn't have _BitInt(17) type? The if (known_eq (offset, 0) && !reverse && poly_int_tree_p (TYPE_SIZE (type), _size) && known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size)) check tries to assess that no extension is required, does it work if you adjust that for the _BitInt case? OTOH the reduce_bit_field handling in VIEW_CONVERT_EXPR expansion looks misplaced - shouldn't it be before the INTEGRAL_TYPE_P handling?
[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #8 from Richard Biener --- Fixed.
[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388 --- Comment #4 from Richard Biener --- It's DSE5 deleting Deleted dead store: a[b.19_216] = 1; there's a big irreducible region following the loop with this store, but I fail to see how we can reach the load without going through the other redundant store. Ah, wait - it's the same as with loops in irreducible regions and triggering a latent issue. We do /* If we visit this PHI by following a backedge then we have to make sure ref->ref only refers to SSA names that are invariant with respect to the loop represented by this PHI node. */ if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt), gimple_bb (use_stmt)) && !for_each_index (ref->ref ? >ref : >base, check_name, gimple_bb (use_stmt))) return DSE_STORE_LIVE; but we identify backedges by using dominators which only works for natural loops and not irreducible regions. We have to either disregard all refs in irreducible regions or check for invariantness in the irreducible (sub-)region spanned by the PHI and the backedge source. I'm going to check the latter.
[Bug debug/115386] ice with -g -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115386 --- Comment #8 from Richard Biener --- (In reply to David Binderman from comment #7) > (In reply to Richard Biener from comment #6) > > Are you using a compiler with release checking? > > No, with asan & ubsan. > > I tried running cc1 under gdb and got this backtrace: > > #0 0x00b54615 in gt_ggc_mx_rtx_def (x_p=0x7fffe939bd00) > at gtype-desc.cc:323 > #1 0x00b54829 in gt_ggc_mx_rtx_def (x_p=) > at gtype-desc.cc:940 > #2 0x00b55405 in gt_ggc_mx_rtx_def (x_p=) > at gtype-desc.cc:717 > #3 0x00b55405 in gt_ggc_mx_rtx_def (x_p=) > at gtype-desc.cc:717 > #4 0x00b55405 in gt_ggc_mx_rtx_def (x_p=) > at gtype-desc.cc:717 > #5 0x00b55405 in gt_ggc_mx_rtx_def (x_p=) > at gtype-desc.cc:717 > > That continues on for a depth of more than 1000 frames. Yes, the garbage collecting marking can be deeply recursive. I guess asan/ubsan cause the marker functions to consume more stack. The issue can likely be reproduced even w/o asan/ubsan by lowering the stack size though I'm not sure how much the frame size of gt_ggc_mx_rtx_def explodes with asan/ubsan (or other functions in gtype-desc.cc). It might bake sense to exempt gtype-desc.cc from asan/ubsan instrumentation. Lowering the stack size to 1MB down from 8MB still doesn't make it reproduce without UBSAN/ASAN ...
[Bug middle-end/115411] ICE : in expand_call, at calls.cc:3668
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115411 Richard Biener changed: What|Removed |Added Component|c |middle-end Keywords||ice-on-valid-code --- Comment #1 from Richard Biener --- I think there's related bugs where error-recovery for the /root/gdbtest/gcctest/gcc_llvm/gcc/z2.cc:5:5: sorry, unimplemented: passing too large argument on stack 5 | f (*x); | ~~^~~~ error isn't fool-proof.
[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395 --- Comment #6 from Richard Biener --- In fact, the main loop ends up not using SLP but the epilogue one does and we end up setting STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT which we do not support for SLP. The question is whether to add that support or simply fail (but this is code generation). It's probably easiest to transitionally implement support and rip it out again later.
[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395 --- Comment #5 from Richard Biener --- It needs epilogue vectorization to trigger and it's the path re-using the vector accumulator from the earlier loop that goes wrong when the main vector loop is skipped. We apply the initial value adjustment to the scalar result but the continuation fails to do this and the epilogue vector epilogue expects the earlier code to have done it. IIRC we force "optimization" of this to be disabled but obviously somehow fail to do this for SLP.
[Bug debug/115386] ice with -g -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115386 Richard Biener changed: What|Removed |Added Version|unknown |15.0 --- Comment #6 from Richard Biener --- Are you using a compiler with release checking? Stack overflow with the GGC recursion might depend on not collecting too often as it would happen with checking enabled. I don't see expand taking much time on x86_64, most is IL verification and if that's disabled sched2.
[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 --- Comment #4 from Richard Biener --- (In reply to Robin Dapp from comment #3) > For the record - the hunk before bootstrapped and regtested on the cfarm > machines and tested successfully on aarch64 qemu with sve. I still need to > set up a regtest environment with SME. I think the patch is OK, so I suggest to post it and CC Richard S. so he can chime in.
[Bug target/115404] [15 Regression] possibly wrong code on glibc-2.39 since r15-1113-gde05e44b2ad963
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115404 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Target||x86_64-*-* i?86-*-*
[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener --- Mine.
[Bug lto/115394] ICE in lto_read_decls for a minimal C test-case with streamer_debugging set to true
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115394 Richard Biener changed: What|Removed |Added Keywords||internal-improvement --- Comment #1 from Richard Biener --- I'm quite sure streamer_debugging was never updated after the rewrite a few years ago. I'd suggest to remove all traces of it, it's a very weak bit of debugging it adds ontop existing consistency checks.
[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Version|unknown |15.0 --- Comment #3 from Richard Biener --- Ah, finally a small testcase. I'll have a look.
[Bug rtl-optimization/115384] [15 Regression] ICE: RTL check: expected code 'const_int', have 'const_wide_int' in simplify_binary_operation_1, at simplify-rtx.cc:4088 since r15-1047-g7876cde25cbd2f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115384 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 Richard Biener changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #8 from Richard Biener --- Should be fixed now.
[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 --- Comment #2 from Richard Biener --- I think it should work, but there's also prepare_vec_mask which is using a cache but I have no idea whether this is applicable for non-load/store and whether there's extra work to be done for it to be usable. Richard?
[Bug tree-optimization/115385] Peeling for gaps can be optimized more or needs to peel more than one iteration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115385 Richard Biener changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2024-06-07 Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- Mine.
[Bug tree-optimization/115385] New: Peeling for gaps can be optimized more or needs to peel more than one iteration
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115385 Bug ID: 115385 Summary: Peeling for gaps can be optimized more or needs to peel more than one iteration Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- Consider void __attribute__((noipa)) foo(unsigned char * __restrict x, unsigned char *y, int n) { for (int i = 0; i < n; ++i) { x[16*i+0] = y[3*i+0]; x[16*i+1] = y[3*i+1]; x[16*i+2] = y[3*i+2]; x[16*i+3] = y[3*i+0]; x[16*i+4] = y[3*i+1]; x[16*i+5] = y[3*i+2]; x[16*i+6] = y[3*i+0]; x[16*i+7] = y[3*i+1]; x[16*i+8] = y[3*i+2]; x[16*i+9] = y[3*i+0]; x[16*i+10] = y[3*i+1]; x[16*i+11] = y[3*i+2]; x[16*i+12] = y[3*i+0]; x[16*i+13] = y[3*i+1]; x[16*i+14] = y[3*i+2]; x[16*i+15] = y[3*i+0]; } } and void __attribute__((noipa)) bar(unsigned char * __restrict x, unsigned char *y, int n) { for (int i = 0; i < n; ++i) { x[16*i+0] = y[5*i+0]; x[16*i+1] = y[5*i+1]; x[16*i+2] = y[5*i+2]; x[16*i+3] = y[5*i+3]; x[16*i+4] = y[5*i+4]; x[16*i+5] = y[5*i+0]; x[16*i+6] = y[5*i+1]; x[16*i+7] = y[5*i+2]; x[16*i+8] = y[5*i+3]; x[16*i+9] = y[5*i+4]; x[16*i+10] = y[5*i+0]; x[16*i+11] = y[5*i+1]; x[16*i+12] = y[5*i+2]; x[16*i+13] = y[5*i+3]; x[16*i+14] = y[5*i+4]; x[16*i+15] = y[5*i+0]; } } for both loops we currently cannot reduce the access for the load from 'y' to not touch extra elements so we force peeling for gaps. But in both cases peeling a single scalar iteration is not sufficient and we get t.c:5:21: note: ==> examining statement: _3 = y[_1]; t.c:5:21: missed: peeling for gaps insufficient for access t.c:7:20: missed: not vectorized: relevant stmt not supported: _3 = y[_1]; we can avoid this excessive peeling for gaps if we narrow the load from 'y' to the next power-of-two size where then it's always sufficient to just peel a single scalar iteration. When the target cannot construct a vector with those elements we'd have to peel more than one iteration.
[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 --- Comment #4 from Richard Biener --- Created attachment 58378 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58378=edit patch I'm testing this, but I do not have hardware to test correctness (and qemu not set up).
[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 Richard Biener changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org --- Comment #3 from Richard Biener --- So we're now doing a EXTRACT_LAST_REDUCTION with multiple stmt copies which is disallowed for non-SLP (by accident?). It shows it of course doesn't work since we end up removing the scalar reduction stmt multiple times. [local count: 860067202]: # j_12 = PHI # i_14 = PHI # vect_vec_iv_.9_45 = PHI <_46(8), _47(28)> _46 = vect_vec_iv_.9_45 + { 16, 16, 16, 16 }; _48 = vect_vec_iv_.9_45 + { 4, 4, 4, 4 }; _49 = _48 + { 4, 4, 4, 4 }; _50 = _49 + { 4, 4, 4, 4 }; vect__1.10_51 = (vector(4) float) vect_vec_iv_.9_45; vect__1.10_52 = (vector(4) float) _48; vect__1.10_53 = (vector(4) float) _49; vect__1.10_54 = (vector(4) float) _50; mask__3.11_55 = vect__1.10_51 < { 0.0, 0.0, 0.0, 0.0 }; mask__3.11_56 = vect__1.10_52 < { 0.0, 0.0, 0.0, 0.0 }; mask__3.11_57 = vect__1.10_53 < { 0.0, 0.0, 0.0, 0.0 }; mask__3.11_58 = vect__1.10_54 < { 0.0, 0.0, 0.0, 0.0 }; j_2 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_55, vect_vec_iv_.9_45); and we removed the old j_2 = _3 ? i_14 : j_12; we are about to insert j_2 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_56, _48); I think correct would be j_59 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_55, vect_vec_iv_.9_45); j_60 = .FOLD_EXTRACT_LAST (j_59, mask__3.11_56, _48); j_61 = .FOLD_EXTRACT_LAST (j_60, mask__3.11_57, _49); j_2 = .FOLD_EXTRACT_LAST (j_61, mask__3.11_58, _50); I'm testing a patch.
[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Priority|P3 |P1 --- Comment #2 from Richard Biener --- I can reproduce.
[Bug tree-optimization/115382] New: Wrong code with in-order conditional reduction and masked loops
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382 Bug ID: 115382 Summary: Wrong code with in-order conditional reduction and masked loops Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- vectorize_fold_left_reduction does if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo)) mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, i); else if (is_cond_op) mask = vec_opmask[i]; that doesn't work - both masks have to be combined. This for example shows in a runfail of gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c with -march=cascadelake --param vect-partial-vector-usage=2 on x86_64. The len-masking code looks good.
[Bug tree-optimization/115381] Missed deoptimization opportunity when comparing two different linker symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115381 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-06-07 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC||hubicka at gcc dot gnu.org --- Comment #2 from Richard Biener --- Doesn't seem to help here. Related testcase: extern int x; extern int y; int z(){ return == } possibly -fno-semantic-interposition doesn't cover the definitions being aliases of each other. Defining TU: int x(){} int __attribute__((alias("x"))) y(); I believe this is wrong-code from clang.
[Bug tree-optimization/115381] Missed deoptimization opportunity when comparing two different linker symbols
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115381 --- Comment #1 from Richard Biener --- -fno-semantic-interposition
[Bug target/115373] [15 Regression] RISCV slp-cond-2-big-array.c slp-cond-2.c scan-tree-dump fails since r15-859-geaaa4b88038
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115373 Richard Biener changed: What|Removed |Added Target|riscv |riscv, aarch64 --- Comment #3 from Richard Biener --- Same on aarch64.
[Bug target/115375] [15 Regression] RISCV scan failures since 2024-05-04
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115375 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Target||riscv Keywords||testsuite-fail Target Milestone|--- |15.0 --- Comment #1 from Richard Biener --- Yes, I've seen these in the precommit CI, scan-assembler are notoriously difficult to "adjust" and even analyze. I left this to risc-v folks assuming they are fine with this as Richard was fine doing the same for arm.
[Bug c/115374] fmod() in x86_64 -O3 not using return value from the glibc's implementation if x87 FPU fprem returns NaN
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115374 --- Comment #9 from Richard Biener --- Yep, it's call DCE which elides the errno setting function call iff the result is not NaN.
[Bug target/115373] [15 Regression] RISCV slp-cond-2-big-array.c slp-cond-2.c scan-tree-dump fails since r15-859-geaaa4b88038
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115373 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 CC||rguenth at gcc dot gnu.org Target||riscv Keywords||testsuite-fail --- Comment #2 from Richard Biener --- This also wasn't seen in precommit CI. I can confirm it on trunk and the issue is that we prefer load-lanes for f3 instead of SLP. This issue will go away when we do load-lanes from SLP, so it's intermittent (but I can't promise any timeline). I wonder if the FAIL also occurs on aarch64. There's vect_load_lanes to eventually "fix" the FAIL by adjusting the testcase expectation.
[Bug target/115372] [15 Regression] RISCV pr97428.c scan-tree-dump-times after r15-812-gc71886f2ca2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115372 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Keywords||testsuite-fail CC||rguenth at gcc dot gnu.org Target||riscv --- Comment #2 from Richard Biener --- I don't remember seeing FAIL: gcc.dg/vect/pr97428.c in the precommit CI, this one should get one SLP instance and seeing zero means it now fails to SLP on RISC-V. With a cross and rv64gcv I don't see this failure (on top of trunk). Ah, for me it's XFAILed because of ! vect_hw_misalign - do you use additional flags? But even adding -mno-strict-align doesn't help. Oh, the dejagnu harness uses check_effective_target_riscv_v_misalign_ok which _runs_ a testcase ... which of course fails for my simple cc1 cross (w/o binutils and w/o qemu set up). Is the precommit CI any better here?
[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Keywords||missed-optimization
[Bug other/115365] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Richard Biener --- Fixed I assume.
[Bug c++/115364] [11/12/13/14/15 Regression] ICE-on-invalid when calling non-const template member on const object
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115364 Richard Biener changed: What|Removed |Added Priority|P3 |P4
[Bug tree-optimization/115363] Missing loop vectorization due to loop bound load not being pulled out
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115363 Richard Biener changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-06-06 --- Comment #1 from Richard Biener --- Invariant motion doesn't do versioning for aliasing. But in fact once the loop iterates array[k] can no longer alias this->size but this is difficult to exploit (peeling the loop once would help). I'm not sure we should start to version all those loops where the exit condition depends on a not hoistable but invariant expression? But maybe we can diagnose this so people can rewrite their code.
[Bug target/115362] fixed_size_simd dot product recognition and sign of determinant not working for stdx::reduce
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-06-06 Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 --- Comment #12 from Richard Biener --- How should I compile this? > /space/rguenther/install/gcc-14.1/bin/g++ t.C -std=gnu++2b -mavx2 t.C: In function ‘int main(int, char**)’: t.C:105:29: warning: ignoring attributes on template argument ‘__m128’ [-Wignored-attributes] 105 | std::array<__m128, 3> sse = | ^ In file included from /spc/space/rguenther/install/gcc-14.1/lib64/gcc/x86_64-pc-linux-gnu/14.1.0/include/immintrin.h:39, from /spc/space/rguenther/install/gcc-14.1/lib64/gcc/x86_64-pc-linux-gnu/14.1.0/include/x86intrin.h:32, from /spc/space/rguenther/install/gcc-14.1/include/c++/14.1.0/experimental/bits/simd.h:45, from /spc/space/rguenther/install/gcc-14.1/include/c++/14.1.0/experimental/simd:74, from t.C:4: t.C: In static member function ‘static constexpr T math::vec::storage::dot_sse(FIRST, OTHER&& ...) [with FIRST = __vector(4) float; OTHER = {__vector(4) float&, __vector(4) float&}; T = float; long unsigned int N = 3]’: t.C:46:91: error: the last argument must be an 8-bit immediate 46 | constexpr T dot_sse(FIRST first, OTHER&&... other) { return _mm_dp_ps(first, (... * std::forward(other)), mask4dp(N))[0]; } | ^
[Bug lto/115359] ICE in warn_types_mismatch: lto1: internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115359 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org, ||rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener --- The issue is probably get_odr_name_for_type returning sth non-NULL for both. But yeah, duping before copying looks wrong since we seem to expect NULL eventually. if (name1 = cplus_demangle (odr1, opts)) { name1 = xstrdup (name1); ... might be even better. Honza?
[Bug c++/115358] [13/14/15 Regression] template argument deduction/substitution failed in generic lambda function use of static constexpr array type whos initializer defines the size
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355 Richard Biener changed: What|Removed |Added Target||powerpc64le Keywords||wrong-code --- Comment #2 from Richard Biener --- wild guess - store-with-len with bogus initial len/bias value?
[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932 --- Comment #10 from Richard Biener --- I think the question is why IVOPTs ends up using both the signed and unsigned variant of the same IV instead of expressing all uses of both with one IV? That's where I'd look into.
[Bug tree-optimization/115354] [14/15 Regression] Large -Os code size increase related to -ftree-sra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115354 Richard Biener changed: What|Removed |Added Summary|Large -Os code size |[14/15 Regression] Large |increase related to |-Os code size increase |-ftree-sra |related to -ftree-sra Target Milestone|--- |14.2 CC||jamborm at gcc dot gnu.org Keywords||missed-optimization --- Comment #1 from Richard Biener --- The optimization is performed optimistically anticipating followup optimizations to make up for the immediate caused bloat (that's what I understand). I'm not sure if we make any attempt of assessing the possibility of that to happen but certainly this transform could be disabled when optimizing for size or for cold calls?
[Bug rtl-optimization/115351] [14/15 regression] pointless movs when passing by value on x86-64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351 Richard Biener changed: What|Removed |Added Target||x86_64-*-* Summary|[14 regression] pointless |[14/15 regression] |movs when passing by value |pointless movs when passing |on x86-64 |by value on x86-64 Target Milestone|--- |14.2 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2024-06-05 Component|c++ |rtl-optimization Keywords||missed-optimization, ||needs-bisection --- Comment #1 from Richard Biener --- Confirmed. The IL we expand from is the same.
[Bug tree-optimization/115347] [12/13/14/15 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115347 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=112859 Version|unknown |14.1.1 --- Comment #2 from Richard Biener --- it's loop distribution doing t2.c:7:12: optimized: Loop nest 1 distributed: split to 2 loops and 0 library calls. We get for (; f < 1; f++) { for (h = 0; h < 2; h++) { d = e[f]; } } for (; f < 1; f++) { for (h = 0; h < 2; h++) { g = e[1].c; e[f].c = 1; } } I think this is similar to the other still open issue where zero-distance inner loop dependences ([f].c doesnt't vary in the inner loop) cause issues with the interpretation of classical dependence analysis. I'm somewhat lost there. PR112859.
[Bug middle-end/115346] [15] Volatile load elimination with packed struct bitfields
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115346 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #4 from Richard Biener --- duplicate *** This bug has been marked as a duplicate of bug 99258 ***
[Bug middle-end/99258] volatile struct access optimized away
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99258 Richard Biener changed: What|Removed |Added CC||patrick at rivosinc dot com --- Comment #4 from Richard Biener --- *** Bug 115346 has been marked as a duplicate of this bug. ***
[Bug middle-end/115345] [12/13/14/15 Regression] Different outputs compared to GCC 11- and MSVC/Clang
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115345 --- Comment #12 from Richard Biener --- (In reply to Djordje Baljozovic from comment #11) > (In reply to Djordje Baljozovic from comment #9) > > (In reply to Andrew Pinski from comment #7) > > > A few questions, does `-fsanitize=undefined -fsanitize=address` report > > > anything? Does it work at -O0 and not just -O3? Does adding > > > -fno-strict-aliasing to the command line "fix" the crash? Are there any > > > warnings with `-Wextra -Wall` that might be causing an issue? > > > > Have not tested -O0 and -fno-strict-aliasing; will let you know if this > > fixed the problem. > > No warnings with -Wextra -Wall to my knowledge. > > > > Sincerely, > > George > > Hi Andrew and Jakub, > The results are more than interesting: > > 1. -fno-strict-aliasing: none of the inputs processed (with O3) > 2. O0: all but one input processed > 3. O3: none of the inputs processed > 4. O1 and O2: all inputs processed without any issues -- this did it. > > Now the question is: how on Earth did O1/O2 do the trick, and not O0?! Can you check whether -O0 works with the other compilers? It feels like you might be triggering some undefined behavior in your code. If you have a short running example that breaks with -O0 it might be also interesting to run it through valgrind to spot use-after-free or uninitialized use issues. > Once again, thanks a lot for your detailed and quick responses. > George > P.S. I will keep @Jakub's bisect idea in mind if something like this happens > in the future.
[Bug tree-optimization/115344] Missing loop counter reversal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Last reconfirmed||2024-06-05 --- Comment #1 from Richard Biener --- IVOPTs can do this with and I also think without the help of IVCANON which could add a decrementing IV (it only does that for constant number of iterations for some reason). I'm not sure why, for this example, IVOPTs doesn't add a candidate IV that decrements to zero. I see Predict doloop failure due to target specific checks. so the doloop candidate isn't added?
[Bug target/115342] [14/15 Regression] AArch64: Function multiversioning initialization incorrect
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115342 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.2
[Bug tree-optimization/113910] [12 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Known to work||12.3.1 Status|ASSIGNED|RESOLVED --- Comment #20 from Richard Biener --- Fixed.
[Bug tree-optimization/110381] [11 Regression] double counting for sum of structs of floating point types
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381 Richard Biener changed: What|Removed |Added Summary|[11/12 Regression] double |[11 Regression] double |counting for sum of structs |counting for sum of structs |of floating point types |of floating point types Priority|P3 |P2 Known to fail||12.3.0 Known to work||12.3.1
[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-06-04 Blocks||53947 Status|UNCONFIRMED |NEW Keywords||missed-optimization Ever confirmed|0 |1 --- Comment #1 from Richard Biener --- The issue is that the DRs for the loads tmp[0][i] and tmp[1][i] are not related - they are off different base pointers. At the moment we are not merging unrelated "groups" (even though the loads are not marked as grouped) into one SLP node. The stores are not considered "grouped" because they have gaps. With SLP-ification you'd get four instances and the same code-gen as now. To do better we'd have to improve the store dataref analysis to see that a vectorization factor of four would "close" the gaps, or more generally support store groups with gaps. Stores with gaps can be handled by masking for example. You get the store side handled when using -fno-tree-loop-vectorize to get basic-block vectorization after unrolling the loop. But you still run into the issue that we do not combine from different load groups during SLP discovery. That's another angle you can attack; during greedy discovery we also do not consider splitting the store but instead build the loads from scalars which is of course less than optimal. Also since we do not re-process the built vector CTORs for further basic-block vectorization opportunities. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations
[Bug c++/115331] [13/14/15 Regression] ICE-on-invalid passing a typoed lambda to a list-initializer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115331 Richard Biener changed: What|Removed |Added Priority|P3 |P4
[Bug c/115326] __builtin_sub_overflow reports incorrect overflow value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115326 Richard Biener changed: What|Removed |Added Keywords||wrong-code CC||jakub at gcc dot gnu.org --- Comment #1 from Richard Biener --- We lower it as int overflow1 = r->as_u64[0] = REALPART_EXPR <.SUB_OVERFLOW ((uint64_t) a->as_u64[0], (uint64_t) b->as_u64[0])>, (int) (_Bool) IMAGPART_EXPR <.SUB_OVERFLOW ((uint64_t) a->as_u64[0], (uint64_t) b->as_u64[0])>; where the assignment to r->as_u64[0] is done before the re-evaluation for the overflow bit. A SAVE_EXPR is missing here? Jakub?
[Bug lto/115327] [ld] [lto] using ld and lto, crash while dynamic compile executable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115327 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID Target||arm --- Comment #1 from Richard Biener --- This bugzilla is for GCC but you are using clang. If you want to report a bug in binutils BFD ld their bugzilla is sourceware.org/bugzilla
[Bug gcov-profile/114751] .gcda:stamp mismatch with notes file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114751 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID CC||aoliva at gcc dot gnu.org --- Comment #10 from Richard Biener --- GCC 11 indeed had a big revamp of how auxiliary files (like .gcno) are named. In case of a single source file as in gcc -c src-file.c -o src-file.refo the auxiliary files are now named after the output file name with stripped extension. So for the above it should be src-file.gcno, the same as with -o src-file.o with GCC 10 or earlier you'd get src-file.refo-src-file.gcno The https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Overall-Options.html#index-dumpbase documentation explains this in detail. It was previously inconsistent but notably it's now different that it was before. Thanks for tracking the issue down, I consider this not a bug now but CCed Alex who implemented this change in case he has anything to add to the observed auxiliary file conflict of gcc -c src-file.c -o src-file.refo and gcc -c src-file.c [-o src-file.o]
[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304 Richard Biener changed: What|Removed |Added Target|sparc*-sun-solaris2.11 GCN |GCN --- Comment #8 from Richard Biener --- Should be fixed on sparc.
[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304 --- Comment #6 from Richard Biener --- For GCN the issue is that with vector(64) unsigned short we fail the permute (but we have { target vect64 } for this reason), but we then re-try with the same mode but with SLP disabled and that succeeds. The best strathegy for GCN would be to gather V4QImode aka SImode into the V64QImode (or V16SImode) vector. For pix2 we have a gap of 28 elements, doing consecutive loads isn't a good strategy here. On x86 we can use a small vector and use half of it (gathers would be slow). On sparc we start with V8QImode which is great but then sparc doesn't seem able to build a V8QImode vector from two V4QImode vectors or have V2SImode and build from two SImode values (and load SImode from pix1/pix2, that possibly due to alignment). I do see a vec_initv2sisi though. Ah, so we verify we can do the load using a permutation, permute two V8QImode 'a' and 'b' to get you a { a_low, b_low } V8QImode vector. The other part is eliding of the gap that will end up loading half of the vector but then pad it out as { a_low, 0 } but then still invoke this unsupported permutation to get { a_low, b_low }. So in this case requiring vect_perm would fix this though there is sparc_vectorize_vec_perm_const and vec_perm<> guarded with VIS2, with -mvis2 we get past this failure point and run into missed: not vectorized: relevant stmt not supported: _35 = (unsigned short) _34; So there's no vec_upack_{hi,lo}_v4hi. vect_unpack guards this. Maybe I should move the test to be x86 specific. I'll add the two dg-effective targets to fix the solaris fallout for now.
[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349 --- Comment #48 from Richard Biener --- (In reply to Christopher Nerz from comment #47) > But shouldn't both give the same value? I'm not sure what the standard says to this. Does std::launder(...) sanitize earlier "undefined behavior"? For example failing to initialize an object? > The return of the new and the std::launder(...) point to the same object and > are both equal read-operations! It is imho not predictable that they behave > differently. One load we can optimize to a constant, the other not (because of .LAUNDER).
[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349 Richard Biener changed: What|Removed |Added CC||jason at gcc dot gnu.org See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=101641 --- Comment #46 from Richard Biener --- (In reply to Christopher Nerz from comment #45) > This is a critical bug which renders gcc unusable for safety relevant > systems using expected/variant or simple ipc. > > You can get the same buggy behavior with far simpler code: > https://godbolt.org/z/1WTnnYceM > > > #include > #include > > bool check() > { > // Just to prove that it is not a problem with alignment etc. > static_assert(alignof(double) == alignof(std::uint64_t)); > static_assert(sizeof(double) == sizeof(std::uint64_t)); > > alignas(8) std::byte buffer[8]; // some buffer > new (buffer) double{1}; // some completely trivial data > // reuse memory -> double ends lifetime, uint64 starts lifetime > std::uint64_t * res = new (buffer) std::uint64_t; > // *res is allowed to be used as it is the correct pointer returned by > new > // *res == 0x3ff0 // and gives correct value > // The very definition of std::launder says that it is suppose to be > used as: > return (*res == *std::launder(reinterpret_cast(buffer))); > } > > int main(int argc, char **argv) { > return check(); // gives false with activatred O2 (true with O0) > } > > > We get the same behavior when initialisating the memory at our version of > "std::uint64_t * res = new (buffer) std::uint64_t;", but were unable to give > a minimal example for that behavior. For this case we end up with an indetermined value for 'buffer' read as uint64_t but that indetermined value is different from the one read after .LAUNDER. A somewhat early IL is MEM[(double *)] = 1.0e+0; _1 = MEM[(uint64_t *)]; _12 = .LAUNDER (); _3 = *_12; _13 = _1 == _3; we then re-interpret 1.0e+0 as uint64_t and then remove the store as dead because there's no valid use - the *_12 load is done as uint64_t. The effect is that the later load reads from uninitialized stack. Note that .LAUNDER only constitutes a data dependence between the and _12 pointer _values_ but there's no dependence of the memory contents pointed to - .LAUNDER is ECF_NOVOPS. That makes the compiler forget what _12 points to but it doesn't make later uint64 loads valid from *_12 from an earlier store to double.
[Bug pch/115312] [14/15 Regression] ICE when including a PCH via compiler option -include
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115312 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.2
[Bug c/115310] Option -Werror=return-type is too aggressive with -std=gnu89
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310 --- Comment #6 from Richard Biener --- (In reply to Florian Weimer from comment #3) > This is just following the previous GCC behavior. For example, with GCC 11: > > $ gcc -S -Werror=return-type -std=gnu89 t.c > t.c:1:1: error: return type defaults to ‘int’ [-Werror=return-type] > 1 | main () { return 0; } > | ^~~~ > > I'm not sure how this is a problem in practice. > > Using -Werror=return-type at the distribution level is … problematic. It's > why we split -Werror=return-mismatch from it, and only enabled the latter by > default in GCC 14. But -Wreturn-mismatch doesn't diagnose the following, only -Wreturn-type does. IIRC we made -Werror=return-type the default mainly because of this. int foo() { } I realize -std=gnu89 isn't perfect but if sources are happy with that it's much better than -fpermissive - not only because -fpermissive only works (is not diagnosed) with GCC14 for C. I also realize -std=gnu89 is going to run into this very same issue with older compilers. Bah.
[Bug c/115311] -fno-builtin-xxx allowing anything for xxx
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115311 --- Comment #3 from Richard Biener --- Note we handle -Wno-xyz similarly, but of course a typo like -fno-builtin-sun (s/sun/sin) isn't noticed this way which is the drawback.
[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255 Richard Biener changed: What|Removed |Added CC|richard.guenther at gmail dot com |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener --- (In reply to Andrew Pinski from comment #5) > The question comes is musttail going to always work at -O0 or should it just > fail at -O0 with an error message. Or rather is musttail is just a hack in > itself and should never be implemented. I think it's going to be quite useless if it doesn't work at -O0. I suppose even demoting the error to must-tail to a warning when not optimizing will be an improvement. OTOH doing that generally (a warning, not error) might be a possibility as well. This isn't going to be a very portable feature since the ability to tail-call depends on the ABI.
[Bug c/115310] Option -Werror=return-type is too aggressive with -std=gnu89
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310 Richard Biener changed: What|Removed |Added CC||fweimer at redhat dot com --- Comment #1 from Richard Biener --- The logic that triggers is if (warn_about_return_type) permerror_opt (loc, flag_isoc99 ? OPT_Wimplicit_int : (warn_return_type > 0 ? OPT_Wreturn_type : OPT_Wimplicit_int), "return type defaults to %"); and it's all documented this way. We have -Werror=return-type to detect the case "Also warn if execution may reach the end of the function body, or if the function does not contain any return statement at all." It would be nice if -std=gnu89 -Werror=return-type -Wno-implicit-int would disable this particular instance about implicit int typed functions. It's really ugly to force old code to use -fpermissive instead of the much cleaner -std=gnu89 just because formerly, with the default of newer -std, we only had a warning for the implicit int while with -std=gnu89 we now get an error for it. Did I say I dislike -fpermissive? (which also gets you diagnostics for older compilers, so packages building in multiple distributions get more difficult to maintain)
[Bug target/115307] [avr] Don't expand isinf() like a built-in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115307 --- Comment #1 from Richard Biener --- The issue is that we probably fold isinff early. On x86 I see already in .original: return !(ABS_EXPR u<= 3.4028234663852885981170418348451692544e+38); I think your option is to provide optabs for isinf but make expansion of them always FAIL; (which is of course a quite ugly way)
[Bug target/115282] [15 regression] gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c fails after r15-812-gc71886f2ca2e46
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282 Richard Biener changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Target|powerpc64-linux-gnu |powerpc64*-linux-gnu Status|NEW |ASSIGNED --- Comment #3 from Richard Biener --- Ah, this is probably a case where we need to split because CSE causes us to associate operations differently so SLP build for the whole thing fails. The three-vector permute issue will go away when I manage to finish the load part of the full SLP enablement. It also fails on LE. It's the node 0x39913f0 (max_nunits=4, refcnt=2) vector(4) unsigned int op template: _14 = in[_13]; stmt 0 _14 = in[_13]; load permutation { 6 } note. We split the 8-group into 6 and two times 1 element. This needs an intermediate (interleaving) permute and indeed the load part will fix it. I suggest to leave this failing until then. The loop is still vectorized but using non-SLP full interleaving until then.
[Bug tree-optimization/115303] gcc.dg/vect/pr112325.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115303 --- Comment #2 from Richard Biener --- Yeah, if requiring vect_shift works for you that's pre-approved.
[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304 Richard Biener changed: What|Removed |Added Keywords||testsuite-fail --- Comment #2 from Richard Biener --- It should only need vect32 - basically I assumed the target can compose the 64bit vector from two 32bit elements. But it might be that for this to work the loads would need to be aligned. What is needed is char-to-short unpacking and vector composition. Either composing V2SImode or V8QImode from two V4QImode vectors. Does the following help? diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c index 36463ca22c5..08942380caa 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c +++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c @@ -4,6 +4,9 @@ typedef unsigned char uint8_t; typedef short int16_t; void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) { + diff = __builtin_assume_aligned (diff, __BIGGEST_ALIGNMENT__); + pix1 = __builtin_assume_aligned (pix1, 4); + pix2 = __builtin_assume_aligned (pix2, 4); for (int y = 0; y < 4; y++) { for (int x = 0; x < 4; x++) diff[x + y * 4] = pix1[x] - pix2[x];
[Bug ada/115305] [15 Regression] many (162) acats regressions on i686-darwin9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115305 Richard Biener changed: What|Removed |Added Target||i686-darwin9 Target Milestone|--- |15.0
[Bug tree-optimization/115278] [13/14 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278 Richard Biener changed: What|Removed |Added Summary|[13/14/15 Regression] |[13/14 Regression] |-ftree-vectorize optimizes |-ftree-vectorize optimizes |away volatile write on |away volatile write on |x86_64 since r13-3219 |x86_64 since r13-3219 Known to work||15.0 --- Comment #10 from Richard Biener --- Fixed on trunk sofar.
[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278 --- Comment #6 from Richard Biener --- (In reply to avieira from comment #5) > > I think we fixed similar bug on the read side. > > I don't have the best memory, but the one I can remember is PR 111882, where > we had the SAVE_EXPR. And the the fix was to not lower bitfields with > non-constant offsets. > > Should dse_classify_store not return *_DEAD for volatiles? It's a low-level worker, it relies on the caller to have performed sanity checking on the stmt itself. I'm testing a patch doing that.
[Bug lto/115300] gcc 14 cannot compile itself on Windows when bootstrap-lto is specified
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115300 --- Comment #3 from Richard Biener --- Can you try --disable-plugin? It might be the mingw equivalent of exporting all dynamic symbols from the cc1 binary runs into target limitations? It looks like the default on *-*-mingw* is disabled though ...
[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278 Richard Biener changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #4 from Richard Biener --- It's actually a latent issue, unrelated to bitfields? We elide the store via tree lhs = gimple_get_lhs (stmt); ao_ref write; ao_ref_init (, lhs); if (dse_classify_store (, stmt, false, NULL, NULL, latch_vdef) == DSE_STORE_DEAD) delete_dead_or_redundant_assignment (, "dead"); but that fails to guard against volatiles.
[Bug rtl-optimization/115297] [14/15 regression] alpha: ICE in simplify_subreg, at simplify-rtx.cc:7554 with -O1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115297 Richard Biener changed: What|Removed |Added Summary|[14 regression] alpha: ICE |[14/15 regression] alpha: |in simplify_subreg, at |ICE in simplify_subreg, at |simplify-rtx.cc:7554 with |simplify-rtx.cc:7554 with |-O1 |-O1 Target Milestone|--- |14.2
[Bug testsuite/115294] [15 regression] dg-additional-files-options change broke several testsuites
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115294 Richard Biener changed: What|Removed |Added Priority|P3 |P1
[Bug ada/115292] [15 Regression] i686-darwin17 bootstrap fails for Ada (between r15-856 and r15-889)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115292 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Version|9.0 |15.0
[Bug c/115290] [12/13/14/15 Regression] tree check fail in c_tree_printer, at c/c-objc-common.cc:330
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115290 Richard Biener changed: What|Removed |Added Priority|P3 |P2
[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278 Richard Biener changed: What|Removed |Added Priority|P3 |P2 --- Comment #3 from Richard Biener --- I think we fixed similar bug on the read side.
[Bug middle-end/115277] [13/14/15 regression] ICF needs to match loop bound estimates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115277 Richard Biener changed: What|Removed |Added Target Milestone|--- |13.4
[Bug tree-optimization/115298] [15 Regression] Various targets failing DSE tests after recent changes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115298 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2024-05-31 Keywords||testsuite-fail Ever confirmed|0 |1 Target Milestone|--- |15.0 --- Comment #1 from Richard Biener --- Huh, I honestly have no idea how those targets would differ here ... I do see void h (char * s) { # PT = anything char * s_3(D) = s; char a[8]; : __builtin_memset (, 0, 8); __builtin_strncpy (, s_3(D), 8); # USE = anything # CLB = anything frob (); a ={v} {CLOBBER(eos)}; return; for nds32-sim but Deleted dead call: __builtin_memset (, 0, 8); void h (char * s) { # PT = nonlocal null char * s_3(D) = s; char a[8]; : __builtin_strncpy (, s_3(D), 8); # USE = nonlocal escaped null { D.2716 } (escaped) # CLB = nonlocal escaped null { D.2716 } (escaped) frob (); a ={v} {CLOBBER(eos)}; return; for x86-64. But then the points-to solutions should not make any difference for DSE in this case ... (the points-to difference is odd in the first place of course). So for the points-to difference this is caused by -a = +a = INTEGER which likely means a different default of -fno-delete-null-pointer-checks or ADDR_SPACE_ADDRESS_ZERO_VALID. That causes us to bring in what the object at (void *)0 points to, and that's ANYTHING since we do not track objects at constant addresses in any way, and those might alias all other objects. The question is more why we generate a = at all, but that's a pre-existing issue. We now simply handle all this correctly (we didn't before, with latent wrong-code). Ah, and the DSE effect then is obviously that now 'strncpy (, s_3(D),..)' reads from a since s_3(D) points to anything now (which includes 'a'), so we can no longer remove/trim an earlier store to 'a'. Ah, and the a = constraint is from the memset. Since we pass a to frob it escapes and everything escaped memory points to also escapes so anything escapes. So I'd say it works correctly now. There might be a missing indirection between NONLOCAL and ESCAPED. Since s = even when anything is in ESCAPED anything isn't NONLOCAL itself (well, but of course technically s can point to NULL as well - another latent incorrectness in PTA, we do not track NULL conservatively, a correctness mistake with ADDR_SPACE_ADDRESS_ZERO_VALID). Btw, changing the testcases to extern void frob (char *); void h (char *s) { char a[8]; __builtin_memset (a, 1, sizeof a); __builtin_strncpy (a, s, sizeof a); frob (a); } shows the same effect on x86_64 - suddenly 'a' points to ANYTHING (0x010101010101...), which makes 's' point to ANYTHING and DSE is gone. Confirmed for the testsuite regression. I don't see how this is a bug though. Maybe the stack object 'a' can never be at address zero? Or any "fixed" address? I'm not sure that such constraint can be modeled in PTA ("split" ANYTHING somehow). Adding -fdelete-null-pointer-checks to the test makes it succeed also on nds32le-elf.
[Bug target/115282] [15 regression] gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c fails after r15-812-gc71886f2ca2e46
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282 Richard Biener changed: What|Removed |Added Target Milestone|--- |15.0 Keywords||testsuite-fail Component|other |target Summary|15 regression] |[15 regression] |gcc.dg/vect/costmodel/ppc/c |gcc.dg/vect/costmodel/ppc/c |ostmodel-slp-12.c fails |ostmodel-slp-12.c fails |after |after |r15-812-gc71886f2ca2e46 |r15-812-gc71886f2ca2e46 --- Comment #1 from Richard Biener --- I don't see a good reason why, but I don't have a BE cross around to check myself. Does BE vect maybe not have unsigned integer vector multiplication support?
[Bug tree-optimization/115275] [14/15 Regression] Missed optimization for Dead Code Elimination
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115275 Richard Biener changed: What|Removed |Added Known to work||13.3.0 Keywords||missed-optimization, ||needs-bisection Priority|P3 |P2 Known to fail||14.1.0, 15.0 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Target Milestone|--- |14.2 Last reconfirmed||2024-05-29 --- Comment #1 from Richard Biener --- Confirmed.
[Bug sanitizer/115273] [12 Regression] passing zero to ctz() check missing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115273 Richard Biener changed: What|Removed |Added Target Milestone|--- |12.4
[Bug debug/115272] [debug] complex type is hard to related back to base type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115272 --- Comment #2 from Richard Biener --- (In reply to Richard Biener from comment #1) > How does it work for 'double' vs. 'long double' themselves? > > <1><32>: Abbrev Number: 3 (DW_TAG_base_type) > <33> DW_AT_byte_size : 16 > <34> DW_AT_encoding: 4(float) > <35> DW_AT_name: (indirect string, offset: 0x60): long double > > so if it's not distinguishable via DW_AT_byte_size you look into > DW_AT_name as well? So it looks like doing the same for _Complex long double > is perfectly in line? Take for example powerpc with it's dual IEEE and IBM long double 128 format.
[Bug debug/115272] [debug] complex type is hard to related back to base type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115272 --- Comment #1 from Richard Biener --- How does it work for 'double' vs. 'long double' themselves? <1><32>: Abbrev Number: 3 (DW_TAG_base_type) <33> DW_AT_byte_size : 16 <34> DW_AT_encoding: 4(float) <35> DW_AT_name: (indirect string, offset: 0x60): long double so if it's not distinguishable via DW_AT_byte_size you look into DW_AT_name as well? So it looks like doing the same for _Complex long double is perfectly in line?
[Bug tree-optimization/115252] The SLP vectorizer failed to perform automatic vectorization on pixel_sub_wxh of x264
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115252 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED Target||x86_64-*-* --- Comment #3 from Richard Biener --- This testcase should be fixed now.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 115252, which changed state. Bug 115252 Summary: The SLP vectorizer failed to perform automatic vectorization on pixel_sub_wxh of x264 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115252 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/114435] PCOM messes up vectorization some times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435 --- Comment #10 from Richard Biener --- (In reply to Richard Biener from comment #9) > So the "pcom messes up SLP" part should be fixed now. The pass dependence > of invariant/store motion and unswitching (and likely also loop splitting) is > something different. We may want to track this in a seprate bug. Note there's a conditional (on graphite) LIM pass after high-level loop opts, it might be an option to turn it into an unconditional instance.
[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 Bug 53947 depends on bug 114435, which changed state. Bug 114435 Summary: PCOM messes up vectorization some times https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163 Bug 26163 depends on bug 114435, which changed state. Bug 114435 Summary: PCOM messes up vectorization some times https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug tree-optimization/114435] PCOM messes up vectorization some times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435 Richard Biener changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #9 from Richard Biener --- So the "pcom messes up SLP" part should be fixed now. The pass dependence of invariant/store motion and unswitching (and likely also loop splitting) is something different. We may want to track this in a seprate bug.