https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 2 Jun 2023, tschwinge at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110082 > > Thomas Schwinge <tschwinge at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Ever confirmed|0 |1 > Last reconfirmed| |2023-06-02 > Status|UNCONFIRMED |NEW > > --- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> --- > (In reply to Richard Biener from comment #1) > > Note that when you do it as > > proposed the code will appear as having no coverage (the counters will be > > allocated at the host side but nothing will increment them). > > ACK, our customer does understand this. > > I infer correctly that the "do it as proposed" does seem fine to you: > > (In reply to me from comment #0) > > My idea is to abstract the "increment the edge execution count" operations > > into some new GIMPLE/IFN code (?), and then later, once the offloading code > > has been split off, lower it to the current form (host-side), or no-op > > (device-side). I'd appreciate a quick review if that approach makes sense? Yes, I think this is a reasonable way to do this - I'll note there's IPA pass analysis that might need adjustments to correctly capture the semantics of the internal functions. I suppose you want to apply this generally, not only to offloaded functions and when offloading is enabled? I briefly considered whether it's possible/useful to move profile instrumentation to the main IPA _transform_ stage but I guess this will unnecessarily complicate the intricate web of things there. Profile read for -fprofile-use would then still need to happen at IPA analysis phase so keeping meta-data between compile and LTRANS phase in-sync to make that working out nicely would be another challenge.