[Bug libstdc++/109442] Dead local copy of std::vector not removed from function

2024-05-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442 --- Comment #19 from Jan Hubicka --- Note that the testcase from PR115037 also shows that we are not able to optimize out dead stores to the vector, which is another quite noticeable problem. void test() { std::vector test;

[Bug middle-end/115037] Unused std::vector is not optimized away.

2024-05-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115037 Jan Hubicka changed: What|Removed |Added CC||jason at redhat dot com,

[Bug middle-end/115037] New: Unused std::vector is not optimized away.

2024-05-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Compiling #include void test() { std::vector test; test.push_back (1); } leads to _Z4testv: .LFB1253: .cfi_startproc subq$8, %rsp

[Bug middle-end/115036] New: division is not shortened based on value range

2024-05-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- For long test(long a, long b) { if (a > 65535 || a < 0) __builtin_unreachable (); if (b > 65535

[Bug ipa/114985] [15 regression] internal compiler error: in discriminator_fail during stage2

2024-05-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985 --- Comment #14 from Jan Hubicka --- So this is problem in ipa_value_range_from_jfunc? It is Maritn's code, I hope he will know why types are wrong here. Once can get type compatibility problem on mismatched declarations and LTO, but it seems

[Bug middle-end/114852] New: jpegxl 10.0.1 is faster with clang18 then with gcc14

2024-04-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3 reports about 8% difference. I can measure 13% on zen3. The code has changed and it is no longer bound

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling)

2024-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235 --- Comment #9 from Jan Hubicka --- Phoronix still claims the difference https://www.phoronix.com/review/gcc14-clang18-amd-zen4/2

[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236 --- Comment #3 from Jan Hubicka --- Seems this perofmance difference is still there on zen4 https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3

[Bug tree-optimization/114787] [13 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #18 from Jan Hubicka --- predict.cc queries number of iterations using number_of_iterations_exit and loop_niter_by_eval and finally using estimated_stmt_executions. The first two queries are not updating the upper bounds

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #13 from Jan Hubicka --- Thanks a lot, looks great! Do we still auto-detect memmove when the copy constructor turns out to be memcpy equivalent after optimization?

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #9 from Jan Hubicka --- Your patch gives me error compiling testcase jh@ryzen3:/tmp> ~/trunk-install/bin/g++ -O3 ~/t.C In file included from /home/jh/trunk-install/include/c++/14.0.1/vector:65, from

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #8 from Jan Hubicka --- I had wrong noexcept specifier. This version works, but I still need to inline relocate_object_a into the loop diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #6 from Jan Hubicka --- Thanks. I though the relocate_a only cares about the fact if the pointed-to type can be bitwise copied. It would be nice to early produce memcpy from libstdc++ for std::pair, so the second patch makes sense

[Bug middle-end/114822] New: ldist should produce memcpy/memset/memmove histograms based on loop information converted

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- When loop is converted to string builtin we lose information about its size. This means that we won't

[Bug libstdc++/114821] _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821 --- Comment #2 from Jan Hubicka --- What I am shooting for is to optimize it later in loop distribution. We can recognize memcpy loop if we can figure out that source and destination memory are different. We can help here with restrict, but I

[Bug libstdc++/114821] New: _M_realloc_append should use memcpy instead of loop to copy data when possible

2024-04-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- In thestcase #include typedef unsigned int uint32_t; std::pair pair; void test() { std::vector> st

[Bug tree-optimization/114787] [13/14 Regression] wrong code at -O1 on x86_64-linux-gnu (the generated code hangs)

2024-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787 --- Comment #13 from Jan Hubicka --- -fdump-tree-all-all changing generated code is also bad. We probably should avoid dumping loop bounds then they are not recorded. I added dumping of loop bounds and this may be unexpected side effect. WIll

[Bug c++/93008] Need a way to make inlining heuristics ignore whether a function is inline

2024-04-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93008 --- Comment #8 from Jan Hubicka --- Note that cold attribute is also quite strong since it turns optimize_size codegen that is often a lot slower. Reading the discussion again, I don't think we have a way to make inline keyword ignored by

[Bug tree-optimization/114779] __builtin_constant_p does not work in inline functions

2024-04-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug middle-end/114774] Missed DSE in simple code due to interleaving sotres

2024-04-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114774 Jan Hubicka changed: What|Removed |Added Summary|Missed DSE in simple code |Missed DSE in simple code

[Bug middle-end/114774] New: Missed DSE in simple code

2024-04-18 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- In the following #include int a; short *p; void test (int b) { a=1; if (b) { (*p)++; a=2; printf (&quo

[Bug testsuite/109596] [14 Regression] Lots of guality testcase fails on x86_64 after r14-162-gcda246f8b421ba

2024-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596 --- Comment #19 from Jan Hubicka --- I looked into the remaining exit/nonexit rename discussed here earlier before the PR was closed. The following patch would restore the code to do the same calls as before my patch PR

[Bug lto/113208] [14 Regression] lto1: error: Alias and target's comdat groups differs since r14-5979-g99d114c15523e0

2024-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208 --- Comment #28 from Jan Hubicka --- So the main problem is that in t2 we have _ZN6vectorI12QualityValueEC1ERKS1_/7 (vector<_Tp>::vector(const vector<_Tp>&) [with _Tp = QualityValue]) Type: function definition analyzed alias

[Bug lto/113208] [14 Regression] lto1: error: Alias and target's comdat groups differs since r14-5979-g99d114c15523e0

2024-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208 --- Comment #27 from Jan Hubicka --- OK, but the problem is same. Having comdats with same key defining different set of public symbols is IMO not a good situation for both non-LTO and LTO builds. Unless the additional alias is never used by

[Bug lto/113208] [14 Regression] lto1: error: Alias and target's comdat groups differs since r14-5979-g99d114c15523e0

2024-04-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208 --- Comment #25 from Jan Hubicka --- So we have comdat groups that diverges in t1.o and t2.o. In one object it has alias in it while in other object it does not Merging nodes for _ZN6vectorI12QualityValueEC2ERKS1_. Candidates:

[Bug ipa/113291] [14 Regression] compilation never (?) finishes with recursive always_inline functions at -O and above since r14-2172

2024-04-09 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291 --- Comment #8 from Jan Hubicka --- I am not sure this ought to be P1: - the compilation technically is finite, but not in reasonable time - it is possible to adjust the testcas (do early inlining manually) and get same infinite build on

[Bug ipa/113359] [13/14 Regression] LTO miscompilation of ceph on aarch64 and x86_64

2024-04-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359 --- Comment #23 from Jan Hubicka --- The patch looks reasonable. We probably could hash the padding vectors at summary generation time to reduce WPA overhead, but that can be done incrementally next stage1. I however wonder if we really

[Bug ipa/109817] internal error in ICF pass on Ada interfaces

2024-04-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109817 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug gcov-profile/113765] [14 Regression] ICE: autofdo: val-profiler-threads-1.c compilation, error: probability of edge from entry block not initialized

2024-03-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765 --- Comment #6 from Jan Hubicka --- Running auto-fdo without guessing branch probabilities is somewhat odd idea in general. I suppose we can indeed just avoid setting full_profile flag. Though the optimization passes are not that much tested

[Bug testsuite/109596] [14 Regression] Lots of guality testcase fails on x86_64 after r14-162-gcda246f8b421ba

2024-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #7 from Jan Hubicka --- Found it, probably. I renamed exit to nonexit (since name was misleading) and then forgot to update propagate_threaded_block_debug_into (exit->dest, entry->dest); I will check this after teaching (w

[Bug testsuite/109596] [14 Regression] Lots of guality testcase fails on x86_64 after r14-162-gcda246f8b421ba

2024-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596 --- Comment #6 from Jan Hubicka --- On this testcase trunk does get same dump as gcc13 for pass just before ch2 with ch2 we get: @@ -192,9 +236,8 @@ # DEBUG BEGIN_STMT goto ; [100.00%] - [local count: 954449105]: + [local count:

[Bug testsuite/109596] [14 Regression] Lots of guality testcase fails on x86_64 after r14-162-gcda246f8b421ba

2024-03-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596 --- Comment #4 from Jan Hubicka --- The change makes loop iteration estimates more realistics, but does not introduce any new code that actually changes the IL, so it seems this makes existing problem more visible. I will try to debug what

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #59 from Jan Hubicka --- just to explain what happens in the testcase. There is test and testb. They are almost same: int testb(void) { struct bar *fp; test2 ((void *)); fp = NULL; (*ptr)++; test3 ((void *)); } the

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #58 from Jan Hubicka --- Created attachment 57702 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57702=edit Compare value ranges in jump functions This patch implements the jump function compare, however it is not good

[Bug ipa/113907] [11/12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-03-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #55 from Jan Hubicka --- > Anyway, can we in the spot my patch changed just walk all > source->node->callees > cgraph_edges, for each of them find the corresponding > cgraph_edge in the alias > and for each walk all the

[Bug ipa/106716] Identical Code Folding (-fipa-icf) confuses between functions with different [[likely]] attributes

2024-03-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106716 --- Comment #6 from Jan Hubicka --- The reason why GIMPLE_PREDICT is ignored is that it is never used after ipa-icf and gets removed at the very beggining of late optimizations. GIMPLE_PREDICT is consumed by profile_generate pass which is

[Bug lto/114241] False-positive -Wodr warning when using -flto and -fno-semantic-interposition

2024-03-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114241 Jan Hubicka changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot gnu.org

[Bug debug/92387] [11/12/13 Regression] gcc generates wrong debug information at -O1 since r10-1907-ga20f263ba1a76a

2024-03-04 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92387 --- Comment #5 from Jan Hubicka --- The revision is changing inlining decisions, so it would be probably possible to reproduce the problem without that change with right alaways_inline and noinline attributes.

[Bug tree-optimization/114207] [12/13/14 Regression] modref gets confused by vecotorized code ` -O3 -fno-tree-forwprop` since r12-5439

2024-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #3 from Jan Hubicka --- mine. The summary is: loads: Base 0: alias set 1 Ref 0: alias set 1 access: Parm 0 param offset:4 offset:0 size:64 max_size:64 stores: Base 0: alias set 1 Ref 0: alias

[Bug lto/85432] Wodr can be more verbose for C code

2024-03-03 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85432 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug tree-optimization/114052] [11/12/13/14 Regression] Wrong code at -O2 for well-defined infinite loop

2024-02-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114052 --- Comment #5 from Jan Hubicka --- So if I understand it right, you want to determine the property that if the loop header is executed then BB containing undefined behavior at that iteration will be executed, too. modref tracks if function

[Bug ipa/108802] [11/12/13/14 Regression] missed inlining of call via pointer to member function

2024-02-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802 --- Comment #5 from Jan Hubicka --- I don't think we can reasonably expect every caller of lambda function to be early inlined, so we need to extend ipa-prop to understand the obfuscated code. I disucussed that with Martin some time ago - I

[Bug ipa/111960] [14 Regression] ICE: during GIMPLE pass: rebuild_frequencies: SIGSEGV (Invalid read of size 4) with -fdump-tree-rebuild_frequencies-all

2024-02-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111960 --- Comment #5 from Jan Hubicka --- hmm. cfg.cc:815 for me is: fputs (", maybe hot", outf); which seems quite safe. The problem does not seem to reproduce for me: jh@ryzen3:~/gcc/build/gcc> ./xgcc -B ./ tt.c -O

[Bug middle-end/113907] [12/13/14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 Jan Hubicka changed: What|Removed |Added Summary|[14 regression] ICU |[12/13/14 regression] ICU

[Bug middle-end/113907] [14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-16 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #39 from Jan Hubicka --- This testcase #include int data[100]; __attribute__((noinline)) int bar (int d, unsigned int d2) { if (d2 > 10) printf ("Bingo\n"); return d + d2; } int test2 (unsigned int i) { if (i > 10)

[Bug middle-end/113907] [14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-15 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 --- Comment #31 from Jan Hubicka --- Having a testcase is great. I was just playing with crafting one. I am still concerned about value ranges in ipa-prop's jump functions. Let me see if I can modify the testcase to also trigger problem with

[Bug ipa/113291] [14 Regression] compilation never (?) finishes with recursive always_inline functions at -O and above since r14-2172

2024-02-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291 --- Comment #6 from Jan Hubicka --- Created attachment 57427 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57427=edit patch The patch makes compilation to finish in reasonable time. I ended up in need to dropping DISREGARD_INLINE_LIMITS

[Bug middle-end/111054] [14 Regression] ICE: in to_sreal, at profile-count.cc:472 with -O3 -fno-guess-branch-probability since r14-2967

2024-02-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111054 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug ipa/113291] [14 Regression] compilation never (?) finishes with recursive always_inline functions at -O and above since r14-2172

2024-02-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291 --- Comment #5 from Jan Hubicka --- There is a cap in want_inline_self_recursive_call_p which gives up on inlining after reaching max recursive inlining depth of 8. Problem is that the tree here is too wide. After early inlining f0 contains 4

[Bug ipa/113291] [14 Regression] compilation never (?) finishes with recursive always_inline functions at -O and above since r14-2172

2024-02-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291 --- Comment #4 from Jan Hubicka --- There is a cap in want_inline_self_recursive_call_p which gives up on inlining after reaching max recursive inlining depth of 8. Problem is that the tree here is too wide. After early inlining f0 contains 4

[Bug middle-end/113907] [14 regression] ICU miscompiled since on x86 since r14-5109-ga291237b628f41

2024-02-14 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-13 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787 --- Comment #13 from Jan Hubicka --- So my understanding is that ivopts does something like offset = - and then translate val = base2[i] to val = *((base1+i)+offset) Where (base1+i) is then an iv variable. I wonder if we consider doing

[Bug tree-optimization/113787] [12/13/14 Regression] Wrong code at -O with ipa-modref on aarch64

2024-02-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787 --- Comment #8 from Jan Hubicka --- I will take a look. Mod-ref only reuses the code detecting errneous paths in ssa-split-paths, so that code will get confused, too. It makes sense for ivopts to compute difference of two memory allocations,

[Bug ipa/113359] [13 Regression] LTO miscompilation of ceph on aarch64

2024-02-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359 --- Comment #11 from Jan Hubicka --- If there are two ODR types with same ODR name one with integer and other with pointer types third field, then indeed we should get ODR warning and give up on handling them as ODR types for type merging. So

[Bug ipa/97119] Top level option to disable creation of IPA symbols such as .localalias is desired

2024-02-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97119 --- Comment #7 from Jan Hubicka --- Local aliases are created by ipa-visibility pass. Most common case is that function is declared inline but ELF superposition rules say that the symbol can be overwritten by a different library. Since GCC

[Bug ipa/113422] Missed optimizations in the presence of pointer chains

2024-01-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113422 --- Comment #2 from Jan Hubicka --- Cycling read-only var discovery would be quite expensive, since you need to interleave it with early opts each round. I wonder how llvm handles this? I think there is more hope with IPA-PTA getting scalable

[Bug ipa/113520] ICE with mismatched types with LTO (tree check: expected array_type, have integer_type in array_ref_low_bound)

2024-01-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113520 --- Comment #8 from Jan Hubicka --- I think the ipa-cp summaries should be used only when types match. At least Martin added type streaming for all the jump functions. So we are missing some check?

[Bug tree-optimization/110852] [14 Regression] ICE: in get_predictor_value, at predict.cc:2695 with -O -fno-tree-fre and __builtin_expect() since r14-2219-geab57b825bcc35

2024-01-17 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug c++/109753] [13/14 Regression] pragma GCC target causes std::vector not to compile (always_inline on constructor)

2024-01-10 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753 --- Comment #12 from Jan Hubicka --- I think this is a problem with two meanings of always_inline. One is "it must be inlined or otherwise we will not be able to generate code" other is "disregard inline limits". I guess practical solution

[Bug middle-end/79704] [meta-bug] Phoronix Test Suite compiler performance issues

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704 Bug 79704 depends on bug 109811, which changed state. Bug 109811 Summary: libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 What|Removed |Added

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 Jan Hubicka changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
||2024-01-05 CC||hubicka at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #2 from Jan Hubicka --- On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems reproducible. Profile looks

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling)

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235 --- Comment #6 from Jan Hubicka --- The internal loops are: static const unsigned keccakf_rotc[24] = { 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 2, 14, 27, 41, 56, 8, 25, 43, 62, 18, 39, 61, 20, 44 }; static const unsigned keccakf_piln[24] = {

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling)

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235 Jan Hubicka changed: What|Removed |Added Summary|SMHasher SHA3-256 benchmark |SMHasher SHA3-256 benchmark

[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang

2024-01-05 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2024-01-01 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 --- Comment #23 from Jan Hubicka --- Created attachment 56970 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56970=edit Patch I am testing Hi, this adds -falign-all-functions parameter. It still look like more reasonable (and backward

[Bug ipa/92606] [11/12/13 Regression][avr] invalid merge of symbols in progmem and data sections

2023-12-12 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606 --- Comment #31 from Jan Hubicka --- This is Maritn's code, but I agree that equals_wpa should reject pairs with "dangerous" attributes on them (ideally we should hash them). I think we could add test for same attributes to equals_wpa and

[Bug ipa/81323] IPA-VRP doesn't handle return values

2023-12-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81323 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #9

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-12-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
at gcc dot gnu.org |hubicka at gcc dot gnu.org --- Comment #18 from Jan Hubicka --- Reading all the discussion again, I am leaning towards -falign-all-functions + documentation update explaining that -falign-functions/-falign-loops are optimizations and ignored for -Os. I do use -falign

[Bug tree-optimization/110062] missed vectorization in graphicsmagick

2023-11-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062 --- Comment #11 from Jan Hubicka --- trunk -O3 -flto -march=native -fopenmp Operation: Sharpen: 257 256 256 Average: 256 Iterations Per Minute GCC13 -O3 -flto -march=native -fopenmp 257 256

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-11-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #18 from Jan Hubicka --- I made a typo: Mainline with -O2 -flto -march=native run manually since build machinery patch is needed 23.03 22.85 23.04 Should be Mainline with -O3 -flto -march=native run

[Bug target/109812] GraphicsMagick resize is a lot slower in GCC 13.1 vs Clang 16 on Intel Raptor Lake

2023-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812 --- Comment #20 from Jan Hubicka --- On zen4 hardware I now get GCC13 with -O3 -flto -march=native -fopenmp 2163 2161 2153 Average: 2159 Iterations Per Minute clang 17 with -O3 -flto -march=native -fopenmp

[Bug middle-end/112653] PTA should handle correctly escape information of values returned by a function

2023-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #8 from Jan Hubicka --- On ARM32 and other targets methods returns this pointer. Togher with making return value escape this probably completely disables any chance for IPA tracking of C++ data types...

[Bug middle-end/110015] openjpeg is slower when built with gcc13 compared to clang16

2023-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015 --- Comment #10 from Jan Hubicka --- runtimes on zen4 hardware. trunk -O3 -flto -march-native 42171 42964 42106 clang -O3 -flto -march=native 37393 37423 37508 gcc 13 -O3 -flto -march=native

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #15 from Jan Hubicka --- With SRA improvements r:aae723d360ca26cd9fd0b039fb0a616bd0eae363 we finally get good performance at -O2. Improvements to push_back implementation also helps a bit. Mainline with default flags (-O2):

[Bug middle-end/112706] New: missed simplification in FRE

2023-11-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Compiling the following testcase (simplified from repeated std::vector::push_back expansion): int *ptr; void link_error (); void test () { int *ptr1 = ptr + 10; int

[Bug middle-end/112653] PTA should handle correctly escape information of values returned by a function

2023-11-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #7 from Jan Hubicka --- Thanks for explanation. I think it is quite common pattern that new object is construted and worked on and later returned, so I think we ought to handle this correctly. Another example just came up in

[Bug rtl-optimization/112657] [13/14 Regression] missed optimization: cmove not used with multiple returns

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657 --- Comment #8 from Jan Hubicka --- The negative return value branch predictor is set to have 98% hitrate (measured on SPEC2k17 some time ago). There is --param predictable-branch-outcome that is also set to 2% so indeed we consider the branch

[Bug ipa/98925] Extend ipa-prop to handle return functions for slot optimization

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98925 --- Comment #3 from Jan Hubicka --- Return value range propagation was added in r:53ba8d669550d3a1f809048428b97ca607f95cf5 however it works on scalar return values only for now. Extending it to aggregates is a logical next step and should not

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-11-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345 Jan Hubicka changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment

[Bug middle-end/112653] We should optimize memmove to memcpy using alias oracle

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653 --- Comment #3 from Jan Hubicka --- PR82898 testcases seems to be about type based alias analysis. However PTA should be useable here.

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 Bug 109849 depends on bug 110377, which changed state. Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377 What|Removed

[Bug libstdc++/110287] _M_check_len is expensive

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 Bug 110287 depends on bug 110377, which changed state. Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377 What|Removed

[Bug middle-end/110377] Early VRP and IPA-PROP should work out value ranges from __builtin_unreachable

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377 Jan Hubicka changed: What|Removed |Added Resolution|--- |FIXED Status|NEW

[Bug middle-end/112653] New: We should optimize memmove to memcpy using alias oracle

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- In this testcase (losely based on libstdc++ implementation of vectors) I we should be able to turn memmove to memcpy because we know

[Bug libstdc++/110287] _M_check_len is expensive

2023-11-21 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 Jan Hubicka changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever confirmed|0

[Bug libstdc++/110287] _M_check_len is expensive

2023-11-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #9 from Jan Hubicka --- This is _M_realloc insert at release_ssa time: eleased 63 names, 165.79%, removed 63 holes void std::vector::_M_realloc_insert (struct vector * const this, struct iterator __position, const struct pair_t &

[Bug middle-end/109849] suboptimal code for vector walking loop

2023-11-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849 --- Comment #21 from Jan Hubicka --- Patch https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html gets us closer to inlining _M_realloc_insert at -O3 (3 insns away) Patch

[Bug libstdc++/110287] _M_check_len is expensive

2023-11-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287 --- Comment #8 from Jan Hubicka --- With return value range propagation https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html reduces --param max-inline-insns-auto needed for _M_realloc_insert to be inlined on my testcase from 39

[Bug tree-optimization/112618] New: internal compiler error: in expand_MASK_CALL, at internal-fn.cc:4529

2023-11-19 Thread hubicka at gcc dot gnu.org via Gcc-bugs
: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- jh@ryzen4:~/gcc/build4/stage1-gcc> cat b.c /* PR tree-optimization/106433 */ int m, *p; __attribute__ ((s

[Bug tree-optimization/110641] [14 Regression] ICE in adjust_loop_info_after_peeling, at tree-ssa-loop-ivcanon.cc:1023 since r14-2230-g7e904d6c7f2

2023-11-06 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110641 Jan Hubicka changed: What|Removed |Added Status|NEW |ASSIGNED --- Comment #3 from Jan Hubicka

[Bug target/109811] libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16

2023-11-02 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811 --- Comment #13 from Jan Hubicka --- So I re-tested it with current mainline and clang 16/17 For mainline I get (megapixels per second, bigger is better): 13.39 13.38 13.42 clang 16: 20.06 20.06

[Bug ipa/59948] Optimize std::function

2023-09-25 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59948 --- Comment #8 from Jan Hubicka --- Trunk optimized stuff return 0, but fails to optimize out functions which becomes unused after indirect inlining. With -fno-early-inlining we end up with: int m () { void * D.48296; int __args#0; struct

[Bug middle-end/111573] New: lambda functions often not inlined and optimized out

2023-09-24 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- #include using namespace std; static int dosum(std::function fn) { return fn(5,6); } int test() { auto sum = [](int a, int b) { return a + b

[Bug middle-end/111552] New: 549.fotonik3d_r regression with -O2 -flto -march=native on zen between g:85d613da341b7630 (2022-06-21 15:51) and g:ecd11acacd6be57a (2022-07-01 16:07)

2023-09-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
16:07) Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone

[Bug middle-end/111551] New: Fix for PR106081 is not working with profile feedback on imagemagick

2023-09-23 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- As seen in https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0=473.507.0=475.507.0=477.507.0; Fix for PR106081

[Bug tree-optimization/111498] New: 951% profile quality regression between g:93996cfb308ffc63 (2023-09-18 03:40) and g:95d2ce05fb32e663 (2023-09-19 03:22)

2023-09-20 Thread hubicka at gcc dot gnu.org via Gcc-bugs
Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- This is seen here on tramp3d -fprofile-use

[Bug middle-end/110973] 9% 444.namd regression between g:c2a447d840476dbd (2023-08-03 18:47) and g:73da34a538ddc2ad (2023-08-09 20:17)

2023-08-29 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973 --- Comment #5 from Jan Hubicka --- Note that some (not all?) namd scores seems to be back to pre-regression https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.120.0 https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=791.120.0

[Bug ipa/111157] [14 Regression] 416.gamess fails with a run-time abort when compiled with -O2 -flto after r14-3226-gd073e2d75d9ed4

2023-08-26 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57 --- Comment #4 from Jan Hubicka --- So here ipa-modref declares the field dead, while ipa-prop determines its value even if it is unused and makes it used later? I think dead argument is probably better than optimizing out one store, so I

[Bug middle-end/111054] [14 Regression] ICE: in to_sreal, at profile-count.cc:472 with -O3 -fno-guess-branch-probability

2023-08-22 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111054 --- Comment #2 from Jan Hubicka --- This is a missing check for profile presence (we can not convert undefined probability to sreal). I will fix that.

  1   2   3   4   5   6   7   8   9   10   >