On 23/06/2020 5:44 pm, Thomas Schwinge wrote:
Hi!

On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <k...@codesourcery.com> wrote:
This patch adds support on nvptx for __sync_val_compare_and_swap operations on
1- and 2-byte values.

Is this a thorough review that these are the only functions missing, or
did you just implement what you found missing for some test case you've
been looking into?  Other architectures' similar libgcc files seem to be
defining more of such related functions.


This was to fix a particular test case that used logical reduction operators on subword values. Support for other operators (e.g. arithmetic add) would need additional support.

Other architectures also define atomic operations for subwords in libgcc (e.g. in config/arm/linux-atomic.c for Arm, config/riscv/atomic.c for RISC-V, in config/m68k/linux-atomic.c for Motorola 68000s etc.).

The implementation is a straight copy of the version for
AMD GCN.

(Should thus be generalized?  That can be done later, as far as I'm
concerned -- there already seems to be quite some code duplication in
libgcc.)

I have not verified the algorithm.

It seems a bit unfortunate to have such a thing outlined in a separate
function, given we're talking about performance-critical code here?  Even
more so for GCN, where there's no JIT compiler that can inline it later,
as it's the case for nvptx?

You have verified that GCC itself shouldn't/can't synthesize such
replacement code, inline?

The GCN/nvptx libgcc code actually seems simple enough so that GCC could
synthesize that internally -- or is there a reason not to?  (Just
curious.)


Without the additional __sync_val_compare_and_swap_* functions, the testcase fails with:

unresolved symbol __sync_val_compare_and_swap_2

when the accelerator compiler is run.

The algorithm is generic and fairly simple, so it is more that GCC doesn't, rather than shouldn't or can't. i.e. No one has implemented it as a fallback in lieu of native support so far.

I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state:

     The gimplifier tries
     three alternative code generation strategies.  Whenever possible,
     an atomic update built-in is used.  If that fails, a
     compare-and-swap loop is attempted.  If that also fails, a
     regular critical section around the expression is used.

..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize
that "compare-and-swap loop" code, which looks vaguely similar to your
libgcc implementation.

> ..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look
> similar/related?

These create loops that repeatedly do the compare-and-swap operation until it succeeds. They cannot work to express a smaller compare-and-swap in terms of a larger one since they lack the necessary shifts and masking.

The exit condition is also different - since we are implementing compare-and-swap, the loop can exit even if the swap fails. The loop exits if the subword read from memory is different from the expected value (i.e. the compare-and-swap fails) or if the word containing the subword is equal to the expected value (i.e. it succeeds). The loop is there to cope with the case where the compare-and-swap fails due to part of the larger word changing that is not part of the subword that we are interested in.

..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap'
etc. could be doing such things?


This delegates to expand_atomic_compare_and_swap (in optabs.c), which would be the logical place to synthesize the sub-word loop. All it does ATM is look for atomic_compare_and_swap op handlers, then for sync_compare_and_swap handlers, then library functions for __sync_val_compare_and_swap.

Kwok

Reply via email to