Issue |
157018
|
Summary |
[Flang][OpenMP] offload hierachical parallelism failed on NVIDIA GPU
|
Labels |
flang
|
Assignees |
|
Reporter |
ye-luo
|
Reproducer code
https://github.com/TApplencourt/OvO/blob/master/test_src/fortran/hierarchical_parallelism/reduction_add-real/target_teams_distribute__parallel_do.F90
```
$ OMP_TARGET_OFFLOAD=mandatory flang -fopenmp --offload-arch=sm_90 -O3 target_teams_distribute__parallel_do.F90
nvlink warning : Stack size for entry function '__omp_offloading_2e_73c80a0c__QQmain_l20' cannot be statically determined
$ ./a.out
"PluginInterface" error: Failure to copy data from device to host. Pointers: host = 0x00007fff77470d14, device = 0x00007f57ff600000, size = 4: "unknown or internal error" error in cuMemcpyDtoHAsync: an illegal memory access was encountered
omptarget error: Copying data from device failed.
omptarget error: Call to targetDataEnd failed, abort target.
omptarget error: Failed to process data after launching the kernel.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
Aborted
```
The actual failure came from the kernel run. It error `an illegal memory access was encountered` got caught by the subsequent cuMemcpyDtoHAsync.
No issue offload to AMD GPU gfx90a.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs