On Tue, 15 Nov 2022 at 13:34, Alexander Monakov <amona...@ispras.ru> wrote: > > On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote: > > > > @item -mrelax-cmpxchg-loop > > > @opindex mrelax-cmpxchg-loop > > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg, > > >-execute pause if load value is not expected. This reduces excessive > > >-cachline bouncing when and works for all atomic logic fetch builtins > > >-that generates compare and swap loop. > > >+For compare and swap loops that emitted by some __atomic_* builtins > > > > s/that emitted/that are emitted/ > > > > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch > > >+counterparts), emit an atomic load before cmpxchg instruction. If the > > > > s/before cmpxchg/before the cmpxchg/ > > > > >+loaded value is not equal to expected, execute a pause instead of > > > > s/not equal to expected/not equal to the expected/ > > > > >+directly run the cmpxchg instruction. This might reduce excessive > > > > s/directly run/directly running/ > > This results in "... execute a pause instead of directly running the > cmpxchg instruction", which needs further TLC because: > > * 'a pause' should be 'the PAUSE instruction'; > * 'directly running [an instruction]' does not seem correct in context. > > The option also applies to __sync builtins, not just __atomic. > > > How about the following: > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > and @ref{__atomic Builtins} lacking a native instruction, optimize > for the highly contended case by issuing an atomic load before the > @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction > when restarting the loop.
That's much better, thanks. My only remaining quibble would be that "invoking" an instruction seems only marginally better than running one. Emitting? Issuing? Using? Adding?