https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122280

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu.org

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
I have now tried the following:

* Unpacked the first attachment (archive.tar.gz, attachment 62555)
* Compiled it with
    mpicxx -std=c++23 -fopenmp -I . mathdemonstrations.cpp

To aid debugging, I changed in line 154 GPU_ONLY to AUTO:

  Math_Functions_Policy p1(Math_Functions_Policy::AUTO);

And #if 0 everything after the following 'C.printtensor' (+ add a '}').

Result: When running it once manually, I got some 'wrong' results for the
host but not for the GPU. I have then run:

for ((I=1; $I<=10; I++)); do OMP_TARGET_OFFLOAD=disabled GOMP_DEBUG=1 ./a.out
|tail -n 12 > dis-$I; done

for ((I=1; $I<=10; I++)); do OMP_TARGET_OFFLOAD=mandatory GOMP_DEBUG=1 ./a.out
|tail -n 12 > mand-$I; done

And the debug output shows that it was indeed offloading.

* * *

Comparing that result with the clang output of comment 0 showed the same
result.

* * *

Having that said, when running it first manually, I got some differences for
the host fallback (but not GPU output) - which didn't reproduce when running it
as above.

That's an x86-64 system with an Nvidia sm_86 GPU: RTX A1000 6GB Laptop
and the distro compiler: 15.2.1 20251006.

I also tried it with -O3 and the current git version of GCC.
and also with -foffload-options=nvptx-none=-march=sm_80.

That's with NVIDIA-SMI 580.95.05, Driver Version: 580.95.05, CUDA Version:
13.0.

* * *

I wonder why I got once different results on the host - and I wonder why it
fails for the bug reporter. I hate Schroedinger bugs/heisenbugs!

Reply via email to