> On 1/21/21 9:25 AM, Richard Biener wrote: > > On Wed, Jan 20, 2021 at 5:25 PM Martin Liška <mli...@suse.cz> wrote: > > > > > > On 1/20/21 5:00 PM, Jan Hubicka wrote: > > > > There are two thinks that I would like to discuss first > > > > 1) I think the option is stil used for value profiling of divisors > > > > > > It's not. Right now the only usage lives in get_nth_most_common_value > > > which > > > is an entry point being used for stringops, indirect call and divmod > > > histograms. > > > > > > > 2) it is not clear to me how the new counter establishes > > > > reproducibility for indiect calls that have more than 32 targets > > > > during > > > > the train run. > > > > > > We cannot ensure, but I would say that it's very unlikely to happen. > > > In case of Firefox, there are definitely other reasons why the build is > > > not reproducible. > > > I would expect arc counters to differ in between multiple training runs. > > > > > > If it's really a problem we can come up with other approaches: > > > - GCOV run-time control over # of tracked values (32 right now) > > > - similarly we can save more values in .gcda files > > > > > > I'm sending updated version of the patch. > > > > So the discussion tells me that we want the option and of course want to > > have > > it work. > > Yep, I've just sent patch for that. > > > In the patch I see the =multithreaded enum part was not documented > > (I don't see how it differs from =parallel-runs?), so I wonder if we really > > need > > three states. > > It's a reserved option value Honza though will be useful for the future (:
Not exactly - I intended it to work already in gcc10 as follows. - With =serial one can trust all counters coming from gcda file (I looked again in details to gcc10 implementation and I think the handling of sign bit is correct, contrary to my previous claim) - With =paralel-runs we can use the sign bit trick to signalize that merging was lossy - With =multithreaded we can only use the counter if sum of individual targets matches the total number of executions (so we know no target got lost). Basically =serial means that you get reproducible profile only if the events (profile counter invocation, profile streaming) come in precisely same order in both train runs. (Such as profiledbootstrap running with make -j1) With =parallel-run you get reproducible profile under the assumption that train run consist of multiple invocations (or does forking), each invocation is reproducible but streaming happens in random order (profiledbootstrap with make -j16). With =multithreaded you get reproducible profile if the profile counter invocations match in both train runs, but they can happen in any order (profiledbootstrap with make -j16 once we make gcc multithreaded, or build of clang with FDO). > > > > > That said, the option handling is indeed broken at the moment. While > > the implementation is not perfect, does it have some pieces that help? > > > > That said, why not simply fix option handling by adding the missing = > > to the option? Using -fprofile-reproducibleserial etc. work but before > > adding backwards compatibility aliases I'd say we change w/o them > > for GCC 11 and only if there are complaints introduce them (and eventually > > backport the option handling fix to GCC 10). > > Yes, I would like to backport the option fix for GCC 10. But apparently, > there's > nobody using the option. I think easy way to get users of this option is to make profile not reproducible by default and modify packages to use right reproducibility option when reproducible builds are intended. It is not feature that comes for free and I think most users of PGO does not care, so I think it should be opt in. In general getting profile reroducible one needs to make train reproducible that is hard when you look at details (such as /tmp/ file name generation issue in gcc) and may lead to need for user to annotate such code. This will become more common problem for multithreaded profiles where one needs to annotate locking and busy waiting loops in them for example (or the scheduler responsible for executing paralle tasks). I can see this to be practically achievable but we probably want to produce some guidelines for doing that and probably teach gcov-tool to compare profiles and say to which degree they match (i.e. which functions match for each of levels of reproducibility). The problem is that profiles are continuous and the errors too, but optimizaitons looks for certain thresholds, so small errors may lead to code changes, so I think our current method of looking at relatively few packages and patching errors when they appear is not very good long term strategy... Especially if it makes us to drop useful transformations by default with -fprofile-use and no additional option. Honza > > Martin > > > > > Richard. > > > > > Martin >