Hi!

On 2024-09-23T09:22:55+0200, Richard Biener <richard.guent...@gmail.com> wrote:
> On Fri, Sep 20, 2024 at 6:50 PM Thomas Schwinge <tschwi...@baylibre.com> 
> wrote:
>> (This is orthogonal to yesterday's
>> "GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated".)
>>
>> We'd like to raise nvptx code generation from PTX ISA 6.0, sm_30 "Kepler"
>> to default PTX ISA 7.3, sm_52 "Maxwell", therefore CUDA 11.3 (2021-04).
>> This is, primarily, so that we're able to use 'alloca' and related stack
>> manipulation instructions, and improve upon the current:
>>
>>     sorry ("target cannot support alloca");
>>
>> I see, for example:
>>
>>   - Ubuntu 22.04 "jammy" LTS has 11.5.1-1ubuntu1 packaged
>>   - Debian 12 "stable" ("bookworm", 2023-06) has 11.8.89~11.8.0-5~deb12u1 
>> packaged
>>
>> ..., and sm_52 "Maxwell" has been supported as of CUDA 6.5 (2014-08), and
>> thus supported by most Nvidia GPUs of the last decade, approximately.
>>
>> The question is, whether we continue to build by default also the current
>> sm_30 "Kepler" multilib variants (to be available for use via
>> building/linking with '-march=sm_30'), or if that's truly obsolete by
>> now, and need no longer be available by default?  (It has been deprecated
>> for a long time, and sm_3x "Kepler architecture support is removed from
>> CUDA 12.0", 2022-12.)  There's always the 'configure'-time
>> '--with-arch=sm_30' if you (additionally to sm_52) still need your target
>> libraries built for sm_30 multilib variants; I would argue that's
>> sufficient?
>
> I seem to have Turing so the change works for me

ACK.

> How would the user
> experience be with using the sm_30 multilib?

Let's consider the cases searately, under the assumption that the user's
Nvidia GPU doesn't support sm_52 or higher:

(1) GCC/nvptx 'configure'd without '--with-arch=[...]' (that is, default
'--with-arch=sm_52'), and there are no sm_30 multilib variants:
offloading codes build fine (for sm_52), but at run time libgomp either
(to be decided) emits an error message or silently ignores the device as
incapable.  (That's effectively the same scenario as if you build with
the wrong '-march=[...]' for GCC/GCN offloading.  Probably we should
mirror for nvptx how GCN behaves.)  <https://gcc.gnu.org/PR71646>
"incompability between ptx code and GPU hardware" should get resolved
here.

(2) GCC/nvptx 'configure'd '--with-arch=sm_30': generally, offloading
codes continue to build and execute as they do now, just for specific
upcoming OpenACC functionality (array/struct reductions), you may run
into compile-time 'sorry ("target cannot support alloca");'.  (..., which
we may evolve into a more helpful error message.)

(3) GCC/nvptx 'configure'd without '--with-arch=[...]' (that is, default
'--with-arch=sm_52'), but we do build sm_30 multilib variants (either
(3a) by default or (3b) upon 'configure'-time request via an additional
option: '--with-multilib-list=default,sm_30' or similar), and the user
builds with '-foffload-options=nvptx-none=-march=sm_30': behaves as (2).

Now, (1) and (2) behave as expected (other than maybe emit more helpful
error messages).  My concern with (3a) is that the sm_30 multilib
variants built by default will largely be unused (as they're obsolete
with moderatly recent Nvidia GPU hardware), but I'll be happy to
implement (3b) if people think that's still helpful.

In fact, (3b) can then generally support 'configure'-time selection of
further multilib variants to be built (for example,
'--with-multilib-list=default,sm_30,sm_89') -- but not use them as
default '-march=[...]', in contrast to what '--with-arch=sm_30' or
'--with-arch=sm_89' does, for example.  I'll look into that.


Grüße
 Thomas

Reply via email to