Re: [PATCH] nvptx: Add support for subword compare-and-swap

Kwok Cheung Yeung Wed, 24 Jun 2020 04:15:02 -0700

On 23/06/2020 5:44 pm, Thomas Schwinge wrote:

Hi!


On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <k...@codesourcery.com> wrote:

This patch adds support on nvptx for __sync_val_compare_and_swap operations on
1- and 2-byte values.


Is this a thorough review that these are the only functions missing, or
did you just implement what you found missing for some test case you've
been looking into?  Other architectures' similar libgcc files seem to be
defining more of such related functions.

This was to fix a particular test case that used logical reduction operators onsubword values. Support for other operators (e.g. arithmetic add) would needadditional support.

Other architectures also define atomic operations for subwords in libgcc (e.g.in config/arm/linux-atomic.c for Arm, config/riscv/atomic.c for RISC-V, inconfig/m68k/linux-atomic.c for Motorola 68000s etc.).

The implementation is a straight copy of the version for
AMD GCN.


(Should thus be generalized?  That can be done later, as far as I'm
concerned -- there already seems to be quite some code duplication in
libgcc.)

I have not verified the algorithm.

It seems a bit unfortunate to have such a thing outlined in a separate
function, given we're talking about performance-critical code here?  Even
more so for GCN, where there's no JIT compiler that can inline it later,
as it's the case for nvptx?

You have verified that GCC itself shouldn't/can't synthesize such
replacement code, inline?

The GCN/nvptx libgcc code actually seems simple enough so that GCC could
synthesize that internally -- or is there a reason not to?  (Just
curious.)

Without the additional __sync_val_compare_and_swap_* functions, the testcasefails with:


unresolved symbol __sync_val_compare_and_swap_2

when the accelerator compiler is run.

The algorithm is generic and fairly simple, so it is more that GCC doesn't,rather than shouldn't or can't. i.e. No one has implemented it as a fallback inlieu of native support so far.

I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state:

     The gimplifier tries
     three alternative code generation strategies.  Whenever possible,
     an atomic update built-in is used.  If that fails, a
     compare-and-swap loop is attempted.  If that also fails, a
     regular critical section around the expression is used.

..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize
that "compare-and-swap loop" code, which looks vaguely similar to your
libgcc implementation.

> ..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look
> similar/related?

These create loops that repeatedly do the compare-and-swap operation until itsucceeds. They cannot work to express a smaller compare-and-swap in terms of alarger one since they lack the necessary shifts and masking.

The exit condition is also different - since we are implementingcompare-and-swap, the loop can exit even if the swap fails. The loop exits ifthe subword read from memory is different from the expected value (i.e. thecompare-and-swap fails) or if the word containing the subword is equal to theexpected value (i.e. it succeeds). The loop is there to cope with the case wherethe compare-and-swap fails due to part of the larger word changing that is notpart of the subword that we are interested in.

..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap'
etc. could be doing such things?

This delegates to expand_atomic_compare_and_swap (in optabs.c), which would bethe logical place to synthesize the sub-word loop. All it does ATM is look foratomic_compare_and_swap op handlers, then for sync_compare_and_swap handlers,then library functions for __sync_val_compare_and_swap.


Kwok

Re: [PATCH] nvptx: Add support for subword compare-and-swap

Reply via email to