Hi!

On 2020-06-15T21:28:12+0100, Kwok Cheung Yeung <k...@codesourcery.com> wrote:
> This patch adds support on nvptx for __sync_val_compare_and_swap operations on
> 1- and 2-byte values.

Is this a thorough review that these are the only functions missing, or
did you just implement what you found missing for some test case you've
been looking into?  Other architectures' similar libgcc files seem to be
defining more of such related functions.

Per the PTX 3.1 manual that I looked into, I see for CAS it supports:
'.b32', '.b64'.  We've got:

    $ build-gcc-offload-nvptx-none/gcc/xgcc -Bbuild-gcc-offload-nvptx-none/gcc 
-dM -E -x c /dev/null | grep -i compare.and.swap
    #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
    #define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1

..., so that would match the PTX 3.1 manual.  GCC also seems to know
about 16-byte ("'.b128'", which doesn't exist in PTX).  A quick run of
your testcase with 'GENERATE_TEST(__int128)' seems to just work, but I
haven't verified why/how.

> The implementation is a straight copy of the version for
> AMD GCN.

(Should thus be generalized?  That can be done later, as far as I'm
concerned -- there already seems to be quite some code duplication in
libgcc.)

I have not verified the algorithm.

It seems a bit unfortunate to have such a thing outlined in a separate
function, given we're talking about performance-critical code here?  Even
more so for GCN, where there's no JIT compiler that can inline it later,
as it's the case for nvptx?

You have verified that GCC itself shouldn't/can't synthesize such
replacement code, inline?

The GCN/nvptx libgcc code actually seems simple enough so that GCC could
synthesize that internally -- or is there a reason not to?  (Just
curious.)

I see 'gcc/doc/generic.texi', "OMP_ATOMIC" state:

    The gimplifier tries
    three alternative code generation strategies.  Whenever possible,
    an atomic update built-in is used.  If that fails, a
    compare-and-swap loop is attempted.  If that also fails, a
    regular critical section around the expression is used.

..., and I see 'gcc/omp-expand.c:expand_omp_atomic_pipeline' synthesize
that "compare-and-swap loop" code, which looks vaguely similar to your
libgcc implementation.

..., or maybe even 'gcc/builtins.c:expand_builtin_compare_and_swap'
etc. could be doing such things?

..., and 'gcc/optabs.c:expand_compare_and_swap_loop' etc. also look
similar/related?

I clearly don't know the history behind all of that.  :-|

> I have added a new libgomp test that exercises the new operation.

Ideally we should also have nvptx target compile-time test, and scan the
PTX assembler code generated -- but we don't have that for a lot of
things, so this is probably OK without, too?

> I have also
> verified that the new code does not cause any regressions on the nvptx
> offloading tests, and that the new test passes with both nvptx and amdgcn as
> offload targets.
>
> Okay for master and OG10?

Given that this solves an actual problem, and we have precedence with the
GCN target doing the same thing, this seems OK (but I can't formally
approve).


Grüße
 Thomas


> commit 7c3a9c23ba9f5b8fe953aa5492ae75617f2444a3
> Author: Kwok Cheung Yeung <k...@codesourcery.com>
> Date:   Mon Jun 15 12:34:55 2020 -0700
>
>     nvptx: Add support for subword compare-and-swap
>
>     2020-06-15  Kwok Cheung Yeung  <k...@codesourcery.com>
>
>       libgcc/
>       * config/nvptx/atomic.c: New.
>       * config/nvptx/t-nvptx (LIB2ADD): Add atomic.c.
>
>       libgomp/
>       * testsuite/libgomp.c-c++-common/reduction-16.c: New.
>
> diff --git a/libgcc/config/nvptx/atomic.c b/libgcc/config/nvptx/atomic.c
> new file mode 100644
> index 0000000..4becbd2
> --- /dev/null
> +++ b/libgcc/config/nvptx/atomic.c
> @@ -0,0 +1,59 @@
> +/* NVPTX atomic operations
> +   Copyright (C) 2020 Free Software Foundation, Inc.
> +   Contributed by Mentor Graphics.
> +
> +   This file is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by the
> +   Free Software Foundation; either version 3, or (at your option) any
> +   later version.
> +
> +   This file is distributed in the hope that it will be useful, but
> +   WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   General Public License for more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <stdbool.h>
> +
> +#define __SYNC_SUBWORD_COMPARE_AND_SWAP(TYPE, SIZE)                       \
> +                                                                          \
> +TYPE                                                                      \
> +__sync_val_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)     
> \
> +{                                                                         \
> +  unsigned int *wordptr = (unsigned int *)((__UINTPTR_TYPE__ ) ptr & ~3UL);  
> \
> +  int shift = ((__UINTPTR_TYPE__ ) ptr & 3UL) * 8;                        \
> +  unsigned int valmask = (1 << (SIZE * 8)) - 1;                              
>      \
> +  unsigned int wordmask = ~(valmask << shift);                               
>      \
> +  unsigned int oldword = *wordptr;                                        \
> +  for (;;)                                                                \
> +    {                                                                        
>      \
> +      TYPE prevval = (oldword >> shift) & valmask;                        \
> +      if (__builtin_expect (prevval != oldval, 0))                        \
> +     return prevval;                                                      \
> +      unsigned int newword = oldword & wordmask;                          \
> +      newword |= ((unsigned int) newval) << shift;                        \
> +      unsigned int prevword                                               \
> +       = __sync_val_compare_and_swap_4 (wordptr, oldword, newword);       \
> +      if (__builtin_expect (prevword == oldword, 1))                      \
> +     return oldval;                                                       \
> +      oldword = prevword;                                                 \
> +    }                                                                        
>      \
> +}                                                                         \
> +                                                                          \
> +bool                                                                      \
> +__sync_bool_compare_and_swap_##SIZE (TYPE *ptr, TYPE oldval, TYPE newval)    
> \
> +{                                                                         \
> +  return __sync_val_compare_and_swap_##SIZE (ptr, oldval, newval) == oldval; 
> \
> +}
> +
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned char, 1)
> +__SYNC_SUBWORD_COMPARE_AND_SWAP (unsigned short, 2)
> +
> diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
> index c4d20c9..ede0bf0 100644
> --- a/libgcc/config/nvptx/t-nvptx
> +++ b/libgcc/config/nvptx/t-nvptx
> @@ -1,5 +1,6 @@
>  LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
> -     $(srcdir)/config/nvptx/mgomp.c
> +     $(srcdir)/config/nvptx/mgomp.c \
> +     $(srcdir)/config/nvptx/atomic.c
>
>  LIB2ADDEH=
>  LIB2FUNCS_EXCLUDE=__main
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c 
> b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> new file mode 100644
> index 0000000..951e522
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-16.c
> @@ -0,0 +1,46 @@
> +/* { dg-do run } */
> +
> +#include <stdlib.h>
> +
> +#define N 512
> +
> +#define GENERATE_TEST(T)     \
> +int test_##T (void)          \
> +{                            \
> +  T a[N], res = 0;           \
> +                             \
> +  for (int i = 0; i < N; ++i)        \
> +    a[i] = i & 1;            \
> +                             \
> +_Pragma("omp target teams distribute reduction(||:res) 
> defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)        \
> +    res = res || a[i];               \
> +                             \
> +  /* res should be non-zero.  */\
> +  if (!res)                  \
> +    return 1;                        \
> +                             \
> +_Pragma("omp target teams distribute reduction(&&:res) 
> defaultmap(tofrom:scalar)") \
> +  for (int i = 0; i < N; ++i)        \
> +    res = res && a[i];               \
> +                             \
> +  /* res should be zero.  */ \
> +  return res;                        \
> +}
> +
> +GENERATE_TEST(char)
> +GENERATE_TEST(short)
> +GENERATE_TEST(int)
> +GENERATE_TEST(long)
> +
> +int main(void)
> +{
> +  if (test_char ())
> +    abort ();
> +  if (test_short ())
> +    abort ();
> +  if (test_int ())
> +    abort ();
> +  if (test_long ())
> +    abort ();
> +}
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter

Reply via email to