https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108552
--- Comment #45 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Linus Torvalds from comment #43) > (In reply to Richard Biener from comment #42) > > > > I think if we want to avoid doing optimizations on gcov counters we should > > make them volatile. > > Honestly, that sounds like the cleanest and safest option to me. > > That said, with the gcov counters apparently also being 64-bit, I suspect it > will create some truly horrid code generation. > > Presumably you'd end up getting a lot of load-load-add-adc-store-store > instruction patterns, which is not just six instructions when just two > should do - it also uses up two registers. > > So while it sounds like the simplest and safest model, maybe it just makes > code generation too unbearably bad? > > Maybe nobody who uses gcov would care. But I suspect it might be quite the > big performance regression, to the point where even people who thought they > don't care will go "that's a bit much". > > I wonder if there is some half-way solution that would allow at least a > load-add-store-load-adc-store instruction sequence, which would then mean > (a) one less register wasted and (b) potentially allow some peephole > optimization turning it into just a addmem-adcmem instruction pair. > > Turning just the one of the memops into a volatile access might be enough > (eg just the load, but not the store?) It might be possible to introduce something like a __volatile_inc () which implements a somewhat relaxed "volatile". For user code volatile long long x; void foo () { x++; } emitting inc + adc with memory operands is only "incorrect" in re-ordering the subword reads with the subword writes, the reads and writes still happen architecturally ... That said, the coverage code could make this re-ordering explicit for 32bit with some conditional code (add-with-overflow) that eventually combines back nicely even with volatile ...