https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698
Martin Liška <marxin at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2017-04-12 CC| |marxin at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |marxin at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> --- Confirmed. Actually I talked with Honza last week about usage of working sets and problems it has. Your sample nicely illustrates one of them: As you have a really dominant edge and the rest of program has sum really really small, then hotness is equal to execution count of the maximal edge. Which is obviously very wrong even in case where loop unrolling is asking for a split edge with fraction of the frequency. Just for your information, before the current state (r193747), we used to compute the hotness threshold as follows: profile_info->sum_max / PARAM_VALUE (HOT_BB_COUNT_FRACTION) where the param used to have value 10000. That results in value 100. By the way, I believe the value should be also divided by profile_info->runs (number of runs) as sum_max is increasing with # of a binary is executed. Second issue I see is quite huge performance overhead during instrumentation run. For programs that are executed repeatedly, one can see in perf top: 11.23% git [.] gcov_do_dump 9.14% git [.] __gcov_write_summary 5.60% libc-2.25.so [.] __memset_sse2_unaligned_erms 3.82% git [.] __gcov_read_summary and of course it occupies space both in profile and an instrumented binary. That said I'm planning to test the original mechanism and compare it to the current one. And we can add an option to switch in between these 2 methods.