On Fri, 23 Oct 2015, Jakub Jelinek wrote: > Thus, if .shared function local is allowed, we'd need to emit two copies of > foo, one which assumes it is run in the teams context and one which assumes > it is run in the parallel context. If automatic vars can be only .local, > we are just in big trouble and I guess we really want to investigate what > others supporting PTX/Cuda are trying to do here.
.shared is statically allocated. There's an implementation of nvptx offloading in Clang/LLVM here https://github.com/clang-omp , they put data that can be shared either in .shared or global memory (user configurable I think). Not sure how they deal with recursion or uncertainty that you describe in regards to the 'foo' function in your example. Can you point me to other compilers implementing OpenMP offloading for PTX? Alexander