On 09/05/2018 07:30 AM, Tom de Vries wrote: > On 09/05/2018 12:19 AM, Cesar Philippidis wrote: >> On 09/02/2018 07:57 AM, Cesar Philippidis wrote: >>> On 09/01/2018 12:04 PM, Tom de Vries wrote: >>>> On 08/31/2018 04:14 PM, Cesar Philippidis wrote: >>> >>>>> Is this patch OK for trunk? >>>>> >>>> >>>> Well, how did you test this ( >>>> https://gcc.gnu.org/contribute.html#patches : "Bootstrapping and >>>> testing. State the host and target combinations you used to do proper >>>> testing as described above, and the results of your testing.") ? >>> >>> I tested the standalone nvptx compiler. I'll retest with libgomp with >>> -misa=sm_35. Bootstrapping won't help much here, unfortunately. >>>>> +++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c >>>>> @@ -0,0 +1,24 @@ >>>>> +/* Test the nvptx atomic instructions for __atomic_fetch_OP for SM_35 >>>>> + targets. */ >>>>> + >>>>> +/* { dg-do compile } */ >>>>> +/* { dg-options "-O2 -misa=sm_35" } */ >>>>> + >>>>> +int >>>>> +main() >>>>> +{ >>>>> + unsigned long long a = ~0; >>>>> + unsigned b = 0xa; >>>>> + >>>>> + __atomic_fetch_add (&a, b, 0); >>>>> + __atomic_fetch_and (&a, b, 0); >>>>> + __atomic_fetch_or (&a, b, 0); >>>>> + __atomic_fetch_xor (&a, b, 0); >>>>> + >>>>> + return a; >>>>> +} >>>>> + >>>>> +/* { dg-final { scan-assembler "atom.add.u64" } } */ >>>>> +/* { dg-final { scan-assembler "atom.b64.and" } } */ >>>>> +/* { dg-final { scan-assembler "atom.b64.or" } } */ >>>>> +/* { dg-final { scan-assembler "atom.b64.xor" } } */ >>>>> -- 2.17.1 >>>>> >>>> >>>> Hmm, the add.u64 vs b64.and looks odd (and the scan-assembler-not >>>> testcase does not use this difference, so that needs to be fixed, or for >>>> bonus points, changed into a scan-assembler testcase). >>>> >>>> The documentation uses "op.type", we should fix the compiler to emit >>>> that consistently. Separate patch that fixes that pre-approved. >>> >>> ACK. I think there are a lot of other cases like that in the BE. >>> >>>> This is ok (with, as I mentioned above, the SI part split off into a >>>> separate patch), on the condition that you test libgomp with >>>> -foffload=-misa=sm_35. >> >> Adding -foffload=misa=sm_35 didn't work because the host gcc doesn't >> support the -misa flag. > > That doesn't make sense to me. For me this works without any problems. > Have you tried a clean build?
I was incorrectly setting ALWAYS_CFLAGS to use -foffload=-misa=sm_35. That didn't work on the host. But lappend'ing tagopt did work. >> When I forced the nvptx BE to set TARGET_SM35 to >> always be true, I ran into problems with SM_30 code linking against >> SM_35 code. > > I also cannot reproduce this, works for me. I found the problem. I wasn't using a clean build. Besides, with the tagopt change in libgomp, I didn't need to force the -misa=sm_35 flag everywhere. >> Therefore, I don't think this patch is ready for trunk yet. >>> By the way, is -misa really necessary for atomic_fetch_<logic><mode>? >> Looking at the PTX documentation I see >> <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#changes-in-ptx-isa-version-3-1>: >> >> PTX ISA version 3.1 introduces the following new features: >> >> * Support for sm_35 target architecture. >> * Extends atomic and reduction instructions to perform 64-bit {and, or, >> xor} operations, and 64-bit integer {min, max} operations. >> >> Is there a table for which list which GPUs are compatible with which >> instructions? > > Yes, every instruction has a table in the ptx manual, and there's a "PTX > ISA Notes" entry. > > For the atom instruction in ptx isa 3.1 manual, we have "PTX ISA Notes": > ... > atom.global requires sm_11 or higher. > atom.shared requires sm_12 or higher. > 64-bit atom.global.{add,cas,exch} require sm_12 or higher. > 64-bit atom.shared.{add,cas,exch} require sm_20 or higher. > 64-bit atom.{and,or.xor,min,max} require sm_35 or higher. > atom.add.f32 requires sm_20 or higher. > Use of generic addressing requires sm_20 or higher. > ... Thanks! I'll commit the attached patch shortly. x86_64 with nvptx offloading regression testing didn't yield any new failures, nor did the standalone nvptx testing. I'll follow up with an SImode patch later. Cesar
[nvptx] Basic -misa support for nvptx 2018-XX-YY Cesar Philippidis <ce...@codesourcery.com> Bernd Schmidt <bernds_...@t-online.de> gcc/ * config/nvptx/nvptx-opts.h: New file. * config/nvptx/nvptx.c (nvptx_file_start): Print the correct .target. * config/nvptx/nvptx.h: Include "nvptx-opts.h". (ASM_SPEC): Define. (TARGET_SM35): New macro. * config/nvptx/nvptx.md (atomic_fetch_<logic><mode>): Enable with the correct predicate. * config/nvptx/nvptx.opt (ptx_isa, sm_30, sm_35): New enum and its values. (misa=): New option. * doc/invoke.texi (Nvidia PTX Options): Document -misa. gcc/testsuite/ * gcc.target/nvptx/atomic_fetch-1.c: New test. * gcc.target/nvptx/atomic_fetch-1.c: New test. (cherry picked from gomp-4_0-branch r223182) --- gcc/config/nvptx/nvptx-opts.h | 30 +++++++++++++++++++ gcc/config/nvptx/nvptx.c | 5 +++- gcc/config/nvptx/nvptx.h | 8 +++++ gcc/config/nvptx/nvptx.md | 3 +- gcc/config/nvptx/nvptx.opt | 14 +++++++++ gcc/doc/invoke.texi | 6 ++++ .../gcc.target/nvptx/atomic-fetch-2.c | 24 +++++++++++++++ .../gcc.target/nvptx/atomic_fetch-1.c | 24 +++++++++++++++ 8 files changed, 111 insertions(+), 3 deletions(-) create mode 100644 gcc/config/nvptx/nvptx-opts.h create mode 100644 gcc/testsuite/gcc.target/nvptx/atomic-fetch-2.c create mode 100644 gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h new file mode 100644 index 00000000000..55d9599917e --- /dev/null +++ b/gcc/config/nvptx/nvptx-opts.h @@ -0,0 +1,30 @@ +/* Definitions for the NVPTX port needed for option handling. + Copyright (C) 2015-2018 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +#ifndef NVPTX_OPTS_H +#define NVPTX_OPTS_H + +enum ptx_isa +{ + PTX_ISA_SM30, + PTX_ISA_SM35 +}; + +#endif + diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index c0b0a2ec3ab..9903a273863 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -4931,7 +4931,10 @@ nvptx_file_start (void) { fputs ("// BEGIN PREAMBLE\n", asm_out_file); fputs ("\t.version\t3.1\n", asm_out_file); - fputs ("\t.target\tsm_30\n", asm_out_file); + if (TARGET_SM35) + fputs ("\t.target\tsm_35\n", asm_out_file); + else + fputs ("\t.target\tsm_30\n", asm_out_file); fprintf (asm_out_file, "\t.address_size %d\n", GET_MODE_BITSIZE (Pmode)); fputs ("// END PREAMBLE\n", asm_out_file); } diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index dfa1e9aa859..a2fe8b68b22 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -21,10 +21,16 @@ #ifndef GCC_NVPTX_H #define GCC_NVPTX_H +#ifndef NVPTX_OPTS_H +#include "config/nvptx/nvptx-opts.h" +#endif + /* Run-time Target. */ #define STARTFILE_SPEC "%{mmainkernel:crt0.o}" +#define ASM_SPEC "%{misa=*:-m %*}" + #define TARGET_CPU_CPP_BUILTINS() \ do \ { \ @@ -87,6 +93,8 @@ #define Pmode (TARGET_ABI64 ? DImode : SImode) #define STACK_SIZE_MODE Pmode +#define TARGET_SM35 (ptx_isa_option >= PTX_ISA_SM35) + /* Registers. Since ptx is a virtual target, we just define a few hard registers for special purposes and leave pseudos unallocated. We have to have some available hard registers, to keep gcc setup diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 2988f5dfa91..dd6032d021b 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -1440,7 +1440,6 @@ (define_code_iterator any_logic [and ior xor]) (define_code_attr logic [(and "and") (ior "or") (xor "xor")]) -;; Currently disabled until we add better subtarget support - requires sm_32. (define_insn "atomic_fetch_<logic><mode>" [(set (match_operand:SDIM 1 "memory_operand" "+m") (unspec_volatile:SDIM @@ -1450,7 +1449,7 @@ UNSPECV_LOCK)) (set (match_operand:SDIM 0 "nvptx_register_operand" "=R") (match_dup 1))] - "0" + "TARGET_SM35" "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;" [(set_attr "atomic" "true")]) diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt index 04277d1d98e..8194c0324d6 100644 --- a/gcc/config/nvptx/nvptx.opt +++ b/gcc/config/nvptx/nvptx.opt @@ -48,3 +48,17 @@ Generate code that can keep local state uniform across all lanes. mgomp Target Report Mask(GOMP) Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt. + +Enum +Name(ptx_isa) Type(int) +Known PTX ISA versions (for use with the -misa= option): + +EnumValue +Enum(ptx_isa) String(sm_30) Value(PTX_ISA_SM30) + +EnumValue +Enum(ptx_isa) String(sm_35) Value(PTX_ISA_SM35) + +misa= +Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) Init(PTX_ISA_SM30) +Specify the version of the ptx ISA to use. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e37233d6ed4..95bc7d9e509 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -22456,6 +22456,12 @@ These options are defined for Nvidia PTX: @opindex m64 Generate code for 32-bit or 64-bit ABI. +@item -misa=@var{ISA-string} +@opindex march +Generate code for given the specified PTX ISA (e.g.@ @samp{sm_35}). ISA +strings must be lower-case. Valid ISA strings include @samp{sm_30} and +@samp{sm_35}. The default ISA is sm_30. + @item -mmainkernel @opindex mmainkernel Link in code for a __main kernel. This is for stand-alone instead of diff --git a/gcc/testsuite/gcc.target/nvptx/atomic-fetch-2.c b/gcc/testsuite/gcc.target/nvptx/atomic-fetch-2.c new file mode 100644 index 00000000000..1d35a176a62 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/atomic-fetch-2.c @@ -0,0 +1,24 @@ +/* Test the nvptx atomic instructions for __atomic_fetch_OP for SM_30 + targets. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_30" } */ + +int +main() +{ + unsigned long long a = ~0; + unsigned b = 0xa; + + __atomic_fetch_add (&a, b, 0); + __atomic_fetch_and (&a, b, 0); + __atomic_fetch_or (&a, b, 0); + __atomic_fetch_xor (&a, b, 0); + + return a; +} + +/* { dg-final { scan-assembler-not "atom.b64.add" } } */ +/* { dg-final { scan-assembler-not "atom.b64.and" } } */ +/* { dg-final { scan-assembler-not "atom.b64.or" } } */ +/* { dg-final { scan-assembler-not "atom.b64.xor" } } */ diff --git a/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c new file mode 100644 index 00000000000..c637caa79a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/nvptx/atomic_fetch-1.c @@ -0,0 +1,24 @@ +/* Test the nvptx atomic instructions for __atomic_fetch_OP for SM_35 + targets. */ + +/* { dg-do compile } */ +/* { dg-options "-O2 -misa=sm_35" } */ + +int +main() +{ + unsigned long long a = ~0; + unsigned b = 0xa; + + __atomic_fetch_add (&a, b, 0); + __atomic_fetch_and (&a, b, 0); + __atomic_fetch_or (&a, b, 0); + __atomic_fetch_xor (&a, b, 0); + + return a; +} + +/* { dg-final { scan-assembler "atom.add.u64" } } */ +/* { dg-final { scan-assembler "atom.b64.and" } } */ +/* { dg-final { scan-assembler "atom.b64.or" } } */ +/* { dg-final { scan-assembler "atom.b64.xor" } } */ -- 2.17.1