[Bug target/122783] New: gcc 15, 16 refuses to compile correctly with nvptx target blackwell cards after drivers with cuda-13 were installed (result of an abi change in cuda-13)

schulz.benjamin at googlemail dot com via Gcc-bugs Thu, 20 Nov 2025 11:03:00 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122783


            Bug ID: 122783
           Summary: gcc 15,16 refuses to compile correctly with nvptx
                    target blackwell cards after drivers with cuda-13 were
                    installed (result of an abi change in cuda-13)
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: schulz.benjamin at googlemail dot com
  Target Milestone: ---

First, I have installed

x11-drivers/nvidia-drivers-580.95.05:0/580::gentoo 
dev-util/nvidia-cuda-toolkit-12.9.1-r1:0/12.9.1::gentoo 
sys-kernel/gentoo-kernel-6.17.8:6.17.8::gentoo 


nvidia-smi says:

NVIDIA-SMI 580.95.05  Driver Version: 580.95.05      CUDA Version: 13.0     
NVIDIA GeForce RTX 5060 Ti                 


nvptx-arch:

sm_120


Also, I have compiled an offload gcc.

I compile the following code with

gcc -fopenmp -foffload=nvptx-none -save-temps  -foffload-options=-march=sm_89
-fno-stack-protector -o gpu_compiler_test ./main.cpp


I use sm_89, since a higher sm seems not to be supported by gcc 16 currently.


---------------------

#include "omp.h"

int main(int argc, char** argv){

    int i=0;
    #pragma omp target map(tofrom:i)
    {
        i=i+1;
    }

    return 0;
}

-----------------

this compiles, and also seems to run. 

Without giving an error!


(Severe errors would only come when you write code that explicitely accesses
arrays allocated with omp_target_alloc.)

Errors appear when checking for cuda problems. Running 

compute-sanitizer --tool memcheck ./gpu_compiler_test

will show this:

========= COMPUTE-SANITIZER
========= Program hit CUDA_ERROR_INVALID_CONTEXT (error 201) due to "invalid
device context" on CUDA API call to cuCtxGetDevice.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: GOMP_OFFLOAD_init_device in plugin-nvptx.c:1371
[0x5677] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_init_device in target.c:3000 [0x3fc05] in
libgomp.so.1
=========         Host Frame: resolve_device in target.c:190 [0x3fdf4] in
libgomp.so.1
=========         Host Frame: GOMP_target_ext in target.c:3279 [0x4067e] in
libgomp.so.1
=========         Host Frame: main [0x1230] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetGlobal_v2.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: GOMP_OFFLOAD_load_image in plugin-nvptx.c:1728
[0x7688] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_load_image_to_device in target.c:2608
[0x34100] in libgomp.so.1
=========         Host Frame: gomp_init_device in target.c:3011 [0x3fc77] in
libgomp.so.1
=========         Host Frame: resolve_device in target.c:190 [0x3fdf4] in
libgomp.so.1
=========         Host Frame: GOMP_target_ext in target.c:3279 [0x4067e] in
libgomp.so.1
=========         Host Frame: main [0x1230] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetGlobal_v2.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: GOMP_OFFLOAD_load_image in plugin-nvptx.c:1800
[0x76d7] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_load_image_to_device in target.c:2608
[0x34100] in libgomp.so.1
=========         Host Frame: gomp_init_device in target.c:3011 [0x3fc77] in
libgomp.so.1
=========         Host Frame: resolve_device in target.c:190 [0x3fdf4] in
libgomp.so.1
=========         Host Frame: GOMP_target_ext in target.c:3279 [0x4067e] in
libgomp.so.1
=========         Host Frame: main [0x1230] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetFunction.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: nvptx_do_global_cdtors in plugin-nvptx.c:1464
[0x3cdb] in libgomp-plugin-nvptx.so.1
=========         Host Frame: GOMP_OFFLOAD_load_image in plugin-nvptx.c:1802
[0x7739] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_load_image_to_device in target.c:2608
[0x34100] in libgomp.so.1
=========         Host Frame: gomp_init_device in target.c:3011 [0x3fc77] in
libgomp.so.1
=========         Host Frame: resolve_device in target.c:190 [0x3fdf4] in
libgomp.so.1
=========         Host Frame: GOMP_target_ext in target.c:3279 [0x4067e] in
libgomp.so.1
=========         Host Frame: main [0x1230] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetFunction.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: nvptx_do_global_cdtors in plugin-nvptx.c:1482
[0x3ef6] in libgomp-plugin-nvptx.so.1
=========         Host Frame: GOMP_OFFLOAD_load_image in plugin-nvptx.c:1802
[0x7739] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_load_image_to_device in target.c:2608
[0x34100] in libgomp.so.1
=========         Host Frame: gomp_init_device in target.c:3011 [0x3fc77] in
libgomp.so.1
=========         Host Frame: resolve_device in target.c:190 [0x3fdf4] in
libgomp.so.1
=========         Host Frame: GOMP_target_ext in target.c:3279 [0x4067e] in
libgomp.so.1
=========         Host Frame: main [0x1230] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetFunction.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: nvptx_do_global_cdtors in plugin-nvptx.c:1464
[0x3cdb] in libgomp-plugin-nvptx.so.1
=========         Host Frame: GOMP_OFFLOAD_fini_device in plugin-nvptx.c:1407
[0x6042] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_target_fini in target.c:6104 [0x32ace] in
libgomp.so.1
=========         Host Frame:  [0x471d0] in libc.so.6
=========         Host Frame: exit [0x472af] in libc.so.6
=========         Host Frame:  [0x27611] in libc.so.6
=========         Host Frame: __libc_start_main [0x276b9] in libc.so.6
=========         Host Frame: _start [0x10c4] in gpu_compiler_test
========= 
========= Program hit CUDA_ERROR_NOT_FOUND (error 500) due to "named symbol not
found" on CUDA API call to cuModuleGetFunction.
=========     Saved host backtrace up to driver entry point at error
=========         Host Frame: nvptx_do_global_cdtors in plugin-nvptx.c:1482
[0x3ef6] in libgomp-plugin-nvptx.so.1
=========         Host Frame: GOMP_OFFLOAD_fini_device in plugin-nvptx.c:1407
[0x6042] in libgomp-plugin-nvptx.so.1
=========         Host Frame: gomp_target_fini in target.c:6104 [0x32ace] in
libgomp.so.1
=========         Host Frame:  [0x471d0] in libc.so.6
=========         Host Frame: exit [0x472af] in libc.so.6
=========         Host Frame:  [0x27611] in libc.so.6
=========         Host Frame: __libc_start_main [0x276b9] in libc.so.6
=========         Host Frame: _start [0x10c4] in gpu_compiler_test
========= 
========= ERROR SUMMARY: 7 errors

Remarkably, it does not show an error to the console. If there were a loop in
the target region, it would even show some numbers of calculations, because of
a host fallback.

In a matrix multiplication, differences would occur if one uses a target teams
distribute parallel for collapse(2) statement, rather than a target teams
distribute for the first loop and a parallel for for the second loop...
And libgomp would crash if memory is accessed that was allocated by
omp_target_alloc

But the cuda errors would only show on explicit check, which may be dangerous
for unwitting users.

This may be connected to an abi change by cuda 13 abi that gets installed with
the new driver. Clang had solved this with this solution: New entries for every
device in cuda-13: 

https://github.com/llvm/llvm-project/commit/dffd7f3d9a3294d21205251b986e76ec841cc750

After that fix for cuda-13, Clang compiles the example above of course, and it
runs on the target gpu without any (cuda) error and visible kernels.


I close the other 2 bugs now that were related to this problem, where I did not
recognize that this was not an error of a construct but that it could not
offload at all....

[Bug target/122783] New: gcc 15, 16 refuses to compile correctly with nvptx target blackwell cards after drivers with cuda-13 were installed (result of an abi change in cuda-13)

Reply via email to