https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109442
--- Comment #19 from Jan Hubicka ---
Note that the testcase from PR115037 also shows that we are not able to
optimize out dead stores to the vector, which is another quite noticeable
problem.
void
test()
{
std::vector test;
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115037
Jan Hubicka changed:
What|Removed |Added
CC||jason at redhat dot com,
: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Compiling
#include
void
test()
{
std::vector test;
test.push_back (1);
}
leads to
_Z4testv:
.LFB1253:
.cfi_startproc
subq$8, %rsp
: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
For
long test(long a, long b)
{
if (a > 65535 || a < 0)
__builtin_unreachable ();
if (b > 65535
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114985
--- Comment #14 from Jan Hubicka ---
So this is problem in ipa_value_range_from_jfunc?
It is Maritn's code, I hope he will know why types are wrong here.
Once can get type compatibility problem on mismatched declarations and LTO, but
it seems
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3
reports about 8% difference. I can measure 13% on zen3. The code has changed
and it is no longer bound
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235
--- Comment #9 from Jan Hubicka ---
Phoronix still claims the difference
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
--- Comment #3 from Jan Hubicka ---
Seems this perofmance difference is still there on zen4
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787
--- Comment #18 from Jan Hubicka ---
predict.cc queries number of iterations using number_of_iterations_exit and
loop_niter_by_eval and finally using estimated_stmt_executions.
The first two queries are not updating the upper bounds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821
--- Comment #13 from Jan Hubicka ---
Thanks a lot, looks great!
Do we still auto-detect memmove when the copy constructor turns out to be
memcpy equivalent after optimization?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821
--- Comment #9 from Jan Hubicka ---
Your patch gives me error compiling testcase
jh@ryzen3:/tmp> ~/trunk-install/bin/g++ -O3 ~/t.C
In file included from /home/jh/trunk-install/include/c++/14.0.1/vector:65,
from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821
--- Comment #8 from Jan Hubicka ---
I had wrong noexcept specifier. This version works, but I still need to inline
relocate_object_a into the loop
diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821
--- Comment #6 from Jan Hubicka ---
Thanks. I though the relocate_a only cares about the fact if the pointed-to
type can be bitwise copied. It would be nice to early produce memcpy from
libstdc++ for std::pair, so the second patch makes sense
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
When loop is converted to string builtin we lose information about its size.
This means that we won't
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114821
--- Comment #2 from Jan Hubicka ---
What I am shooting for is to optimize it later in loop distribution. We can
recognize memcpy loop if we can figure out that source and destination memory
are different.
We can help here with restrict, but I
: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
In thestcase
#include
typedef unsigned int uint32_t;
std::pair pair;
void
test()
{
std::vector> st
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114787
--- Comment #13 from Jan Hubicka ---
-fdump-tree-all-all changing generated code is also bad. We probably should
avoid dumping loop bounds then they are not recorded. I added dumping of loop
bounds and this may be unexpected side effect. WIll
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93008
--- Comment #8 from Jan Hubicka ---
Note that cold attribute is also quite strong since it turns optimize_size
codegen that is often a lot slower.
Reading the discussion again, I don't think we have a way to make inline
keyword ignored by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114779
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114774
Jan Hubicka changed:
What|Removed |Added
Summary|Missed DSE in simple code |Missed DSE in simple code
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
In the following
#include
int a;
short *p;
void
test (int b)
{
a=1;
if (b)
{
(*p)++;
a=2;
printf (&quo
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
--- Comment #19 from Jan Hubicka ---
I looked into the remaining exit/nonexit rename discussed here earlier before
the PR was closed. The following patch would restore the code to do the same
calls as before my patch
PR
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208
--- Comment #28 from Jan Hubicka ---
So the main problem is that in t2 we have
_ZN6vectorI12QualityValueEC1ERKS1_/7 (vector<_Tp>::vector(const vector<_Tp>&)
[with _Tp = QualityValue])
Type: function definition analyzed alias
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208
--- Comment #27 from Jan Hubicka ---
OK, but the problem is same. Having comdats with same key defining different
set of public symbols is IMO not a good situation for both non-LTO and LTO
builds.
Unless the additional alias is never used by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113208
--- Comment #25 from Jan Hubicka ---
So we have comdat groups that diverges in t1.o and t2.o. In one object it has
alias in it while in other object it does not
Merging nodes for _ZN6vectorI12QualityValueEC2ERKS1_. Candidates:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #8 from Jan Hubicka ---
I am not sure this ought to be P1:
- the compilation technically is finite, but not in reasonable time
- it is possible to adjust the testcas (do early inlining manually) and get
same infinite build on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359
--- Comment #23 from Jan Hubicka ---
The patch looks reasonable. We probably could hash the padding vectors at
summary generation time to reduce WPA overhead, but that can be done
incrementally next stage1.
I however wonder if we really
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109817
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113765
--- Comment #6 from Jan Hubicka ---
Running auto-fdo without guessing branch probabilities is somewhat odd idea in
general. I suppose we can indeed just avoid setting full_profile flag. Though
the optimization passes are not that much tested
at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #7 from Jan Hubicka ---
Found it, probably. I renamed exit to nonexit (since name was misleading) and
then forgot to update
propagate_threaded_block_debug_into (exit->dest, entry->dest);
I will check this after teaching (w
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
--- Comment #6 from Jan Hubicka ---
On this testcase trunk does get same dump as gcc13 for pass just before ch2
with ch2 we get:
@@ -192,9 +236,8 @@
# DEBUG BEGIN_STMT
goto ; [100.00%]
- [local count: 954449105]:
+ [local count:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596
--- Comment #4 from Jan Hubicka ---
The change makes loop iteration estimates more realistics, but does not
introduce any new code that actually changes the IL, so it seems this makes
existing problem more visible. I will try to debug what
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #59 from Jan Hubicka ---
just to explain what happens in the testcase. There is test and testb. They
are almost same:
int
testb(void)
{
struct bar *fp;
test2 ((void *));
fp = NULL;
(*ptr)++;
test3 ((void *));
}
the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #58 from Jan Hubicka ---
Created attachment 57702
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57702=edit
Compare value ranges in jump functions
This patch implements the jump function compare, however it is not good
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #55 from Jan Hubicka ---
> Anyway, can we in the spot my patch changed just walk all
> source->node->callees > cgraph_edges, for each of them find the corresponding
> cgraph_edge in the alias > and for each walk all the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106716
--- Comment #6 from Jan Hubicka ---
The reason why GIMPLE_PREDICT is ignored is that it is never used after ipa-icf
and gets removed at the very beggining of late optimizations.
GIMPLE_PREDICT is consumed by profile_generate pass which is
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114241
Jan Hubicka changed:
What|Removed |Added
Assignee|unassigned at gcc dot gnu.org |hubicka at gcc dot
gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92387
--- Comment #5 from Jan Hubicka ---
The revision is changing inlining decisions, so it would be probably possible
to reproduce the problem without that change with right alaways_inline and
noinline attributes.
at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #3 from Jan Hubicka ---
mine.
The summary is:
loads:
Base 0: alias set 1
Ref 0: alias set 1
access: Parm 0 param offset:4 offset:0 size:64 max_size:64
stores:
Base 0: alias set 1
Ref 0: alias
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85432
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114052
--- Comment #5 from Jan Hubicka ---
So if I understand it right, you want to determine the property that if the
loop header is executed then BB containing undefined behavior at that iteration
will be executed, too.
modref tracks if function
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108802
--- Comment #5 from Jan Hubicka ---
I don't think we can reasonably expect every caller of lambda function to be
early inlined, so we need to extend ipa-prop to understand the obfuscated code.
I disucussed that with Martin some time ago - I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111960
--- Comment #5 from Jan Hubicka ---
hmm. cfg.cc:815 for me is:
fputs (", maybe hot", outf);
which seems quite safe.
The problem does not seem to reproduce for me:
jh@ryzen3:~/gcc/build/gcc> ./xgcc -B ./ tt.c -O
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
Jan Hubicka changed:
What|Removed |Added
Summary|[14 regression] ICU |[12/13/14 regression] ICU
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #39 from Jan Hubicka ---
This testcase
#include
int data[100];
__attribute__((noinline))
int bar (int d, unsigned int d2)
{
if (d2 > 10)
printf ("Bingo\n");
return d + d2;
}
int
test2 (unsigned int i)
{
if (i > 10)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
--- Comment #31 from Jan Hubicka ---
Having a testcase is great. I was just playing with crafting one.
I am still concerned about value ranges in ipa-prop's jump functions.
Let me see if I can modify the testcase to also trigger problem with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #6 from Jan Hubicka ---
Created attachment 57427
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57427=edit
patch
The patch makes compilation to finish in reasonable time.
I ended up in need to dropping DISREGARD_INLINE_LIMITS
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111054
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #5 from Jan Hubicka ---
There is a cap in want_inline_self_recursive_call_p which gives up on inlining
after reaching max recursive inlining depth of 8. Problem is that the tree here
is too wide. After early inlining f0 contains 4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113291
--- Comment #4 from Jan Hubicka ---
There is a cap in want_inline_self_recursive_call_p which gives up on inlining
after reaching max recursive inlining depth of 8. Problem is that the tree here
is too wide. After early inlining f0 contains 4
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113907
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787
--- Comment #13 from Jan Hubicka ---
So my understanding is that ivopts does something like
offset = -
and then translate
val = base2[i]
to
val = *((base1+i)+offset)
Where (base1+i) is then an iv variable.
I wonder if we consider doing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113787
--- Comment #8 from Jan Hubicka ---
I will take a look. Mod-ref only reuses the code detecting errneous paths in
ssa-split-paths, so that code will get confused, too. It makes sense for ivopts
to compute difference of two memory allocations,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113359
--- Comment #11 from Jan Hubicka ---
If there are two ODR types with same ODR name one with integer and other with
pointer types third field, then indeed we should get ODR warning and give up on
handling them as ODR types for type merging.
So
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97119
--- Comment #7 from Jan Hubicka ---
Local aliases are created by ipa-visibility pass. Most common case is that
function is declared inline but ELF superposition rules say that the symbol can
be overwritten by a different library. Since GCC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113422
--- Comment #2 from Jan Hubicka ---
Cycling read-only var discovery would be quite expensive, since you need to
interleave it with early opts each round. I wonder how llvm handles this?
I think there is more hope with IPA-PTA getting scalable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113520
--- Comment #8 from Jan Hubicka ---
I think the ipa-cp summaries should be used only when types match. At least
Martin added type streaming for all the jump functions. So we are missing some
check?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110852
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109753
--- Comment #12 from Jan Hubicka ---
I think this is a problem with two meanings of always_inline. One is "it must
be inlined or otherwise we will not be able to generate code" other is
"disregard inline limits".
I guess practical solution
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79704
Bug 79704 depends on bug 109811, which changed state.
Bug 109811 Summary: libjxl 0.7 is a lot slower in GCC 13.1 vs Clang 16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
What|Removed |Added
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
Jan Hubicka changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
||2024-01-05
CC||hubicka at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #2 from Jan Hubicka ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.
Profile looks
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235
--- Comment #6 from Jan Hubicka ---
The internal loops are:
static const unsigned keccakf_rotc[24] = {
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 2, 14, 27, 41, 56, 8, 25, 43, 62, 18,
39, 61, 20, 44
};
static const unsigned keccakf_piln[24] = {
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235
Jan Hubicka changed:
What|Removed |Added
Summary|SMHasher SHA3-256 benchmark |SMHasher SHA3-256 benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
--- Comment #23 from Jan Hubicka ---
Created attachment 56970
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56970=edit
Patch I am testing
Hi,
this adds -falign-all-functions parameter. It still look like more reasonable
(and backward
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92606
--- Comment #31 from Jan Hubicka ---
This is Maritn's code, but I agree that equals_wpa should reject pairs with
"dangerous" attributes on them (ideally we should hash them).
I think we could add test for same attributes to equals_wpa and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81323
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment #9
at gcc dot gnu.org |hubicka at gcc dot
gnu.org
--- Comment #18 from Jan Hubicka ---
Reading all the discussion again, I am leaning towards -falign-all-functions +
documentation update explaining that -falign-functions/-falign-loops are
optimizations and ignored for -Os.
I do use -falign
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110062
--- Comment #11 from Jan Hubicka ---
trunk -O3 -flto -march=native -fopenmp
Operation: Sharpen:
257
256
256
Average: 256 Iterations Per Minute
GCC13 -O3 -flto -march=native -fopenmp
257
256
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #18 from Jan Hubicka ---
I made a typo:
Mainline with -O2 -flto -march=native run manually since build machinery patch
is needed
23.03
22.85
23.04
Should be
Mainline with -O3 -flto -march=native run
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109812
--- Comment #20 from Jan Hubicka ---
On zen4 hardware I now get
GCC13 with -O3 -flto -march=native -fopenmp
2163
2161
2153
Average: 2159 Iterations Per Minute
clang 17 with -O3 -flto -march=native -fopenmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
--- Comment #8 from Jan Hubicka ---
On ARM32 and other targets methods returns this pointer. Togher with making
return value escape this probably completely disables any chance for IPA
tracking of C++ data types...
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110015
--- Comment #10 from Jan Hubicka ---
runtimes on zen4 hardware.
trunk -O3 -flto -march-native
42171
42964
42106
clang -O3 -flto -march=native
37393
37423
37508
gcc 13 -O3 -flto -march=native
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #15 from Jan Hubicka ---
With SRA improvements r:aae723d360ca26cd9fd0b039fb0a616bd0eae363 we finally get
good performance at -O2. Improvements to push_back implementation also helps a
bit.
Mainline with default flags (-O2):
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Compiling the following testcase (simplified from repeated
std::vector::push_back expansion):
int *ptr;
void link_error ();
void
test ()
{
int *ptr1 = ptr + 10;
int
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
--- Comment #7 from Jan Hubicka ---
Thanks for explanation. I think it is quite common pattern that new object is
construted and worked on and later returned, so I think we ought to handle this
correctly.
Another example just came up in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112657
--- Comment #8 from Jan Hubicka ---
The negative return value branch predictor is set to have 98% hitrate (measured
on SPEC2k17 some time ago). There is --param predictable-branch-outcome that
is also set to 2% so indeed we consider the branch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98925
--- Comment #3 from Jan Hubicka ---
Return value range propagation was added in
r:53ba8d669550d3a1f809048428b97ca607f95cf5
however it works on scalar return values only for now. Extending it to
aggregates is a logical next step and should not
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345
Jan Hubicka changed:
What|Removed |Added
CC||hubicka at gcc dot gnu.org
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112653
--- Comment #3 from Jan Hubicka ---
PR82898 testcases seems to be about type based alias analysis. However PTA
should be useable here.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
Bug 109849 depends on bug 110377, which changed state.
Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from
__builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Bug 110287 depends on bug 110377, which changed state.
Bug 110377 Summary: Early VRP and IPA-PROP should work out value ranges from
__builtin_unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377
What|Removed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110377
Jan Hubicka changed:
What|Removed |Added
Resolution|--- |FIXED
Status|NEW
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
In this testcase (losely based on libstdc++ implementation of vectors)
I we should be able to turn memmove to memcpy because we know
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
Jan Hubicka changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
--- Comment #9 from Jan Hubicka ---
This is _M_realloc insert at release_ssa time:
eleased 63 names, 165.79%, removed 63 holes
void std::vector::_M_realloc_insert (struct vector *
const this, struct iterator __position, const struct pair_t &
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109849
--- Comment #21 from Jan Hubicka ---
Patch
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
gets us closer to inlining _M_realloc_insert at -O3 (3 insns away)
Patch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110287
--- Comment #8 from Jan Hubicka ---
With return value range propagation
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637265.html
reduces --param max-inline-insns-auto needed for _M_realloc_insert to be
inlined on my testcase from 39
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
jh@ryzen4:~/gcc/build4/stage1-gcc> cat b.c
/* PR tree-optimization/106433 */
int m, *p;
__attribute__ ((s
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110641
Jan Hubicka changed:
What|Removed |Added
Status|NEW |ASSIGNED
--- Comment #3 from Jan Hubicka
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109811
--- Comment #13 from Jan Hubicka ---
So I re-tested it with current mainline and clang 16/17
For mainline I get (megapixels per second, bigger is better):
13.39
13.38
13.42
clang 16:
20.06
20.06
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59948
--- Comment #8 from Jan Hubicka ---
Trunk optimized stuff return 0, but fails to optimize out functions which
becomes unused after indirect inlining.
With -fno-early-inlining we end up with:
int m ()
{
void * D.48296;
int __args#0;
struct
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
#include
using namespace std;
static int dosum(std::function fn)
{
return fn(5,6);
}
int test()
{
auto sum = [](int a, int b) {
return a + b
16:07)
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
As seen in
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=471.507.0=473.507.0=475.507.0=477.507.0;
Fix for PR106081
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
This is seen here on tramp3d -fprofile-use
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110973
--- Comment #5 from Jan Hubicka ---
Note that some (not all?) namd scores seems to be back to pre-regression
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.120.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=791.120.0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57
--- Comment #4 from Jan Hubicka ---
So here ipa-modref declares the field dead, while ipa-prop determines its value
even if it is unused and makes it used later?
I think dead argument is probably better than optimizing out one store, so I
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111054
--- Comment #2 from Jan Hubicka ---
This is a missing check for profile presence (we can not convert undefined
probability to sreal). I will fix that.
1 - 100 of 3435 matches
Mail list logo