On Tue, 15 Nov 2022 at 13:34, Alexander Monakov <amona...@ispras.ru> wrote:
>
> On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote:
>
> > > @item -mrelax-cmpxchg-loop
> > > @opindex mrelax-cmpxchg-loop
> > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg,
> > >-execute pause if load value is not expected. This reduces excessive
> > >-cachline bouncing when and works for all atomic logic fetch builtins
> > >-that generates compare and swap loop.
> > >+For compare and swap loops that emitted by some __atomic_* builtins
> >
> > s/that emitted/that are emitted/
> >
> > >+(e.g. __atomic_fetch_(or|and|xor|nand) and their __atomic_*_fetch
> > >+counterparts), emit an atomic load before cmpxchg instruction. If the
> >
> > s/before cmpxchg/before the cmpxchg/
> >
> > >+loaded value is not equal to expected, execute a pause instead of
> >
> > s/not equal to expected/not equal to the expected/
> >
> > >+directly run the cmpxchg instruction. This might reduce excessive
> >
> > s/directly run/directly running/
>
> This results in "... execute a pause instead of directly running the
> cmpxchg instruction", which needs further TLC because:
>
> * 'a pause' should be 'the PAUSE instruction';
> * 'directly running [an instruction]' does not seem correct in context.
>
> The option also applies to __sync builtins, not just __atomic.
>
>
> How about the following:
>
> When emitting a compare-and-swap loop for @ref{__sync Builtins}
> and @ref{__atomic Builtins} lacking a native instruction, optimize
> for the highly contended case by issuing an atomic load before the
> @code{CMPXCHG} instruction, and invoke the @code{PAUSE} instruction
> when restarting the loop.

That's much better, thanks. My only remaining quibble would be that
"invoking" an instruction seems only marginally better than running
one. Emitting? Issuing? Using? Adding?

Reply via email to