Meinersbur wrote:
NB: #201103 adds support for DECLARE_TARGET by setting a device-type flag in
the AST that can be specialized later in MLIR, i.e. still the same .mod for
device and and target. Handling `!$omp declare variant` requires more effort;
from the OpenMP examples document:
```f90
subroutine base_saxpy(s,x,y) !! base function
real,intent(inout) :: s,x(:),y(:)
!$omp declare variant( avx512_saxpy ) &
!$omp& match( device={isa("core-avx512")} )
y = s*x + y
end subroutine
subroutine avx512_saxpy(s,x,y) !! function variant
...
```
Keeping a single .mod file for all targets means that `avx512_saxp` must be
kept in case it is used for a target that supports avx512. In contrast, Clang
just skips anything it does match to the current compilation target in the
preprocessor or while creating the AST. A rationale is that `avx512_saxpy` may
contain inline-asm or vector builtins that the current compilation just does
not know about and need to fail if parsed. gcc on the other hand will parse
everything and diverge between host and device at a later stage, like Flang
does. Thanks to the insistance by gcc implementors, OpenMP does not have
predefined preprocessor symbols that are different when compiling for different
targets (like
[`__HIP_DEVICE_COMPILE__`](https://clang.llvm.org/docs/HIPSupport.html#predefined-macros),
[`__CUDA_ARCH__`](https://docs.nvidia.com/cuda/cuda-programming-guide/05-appendices/cpp-language-extensions.html#codecell0)
,[`__SYCL_DEVICE_ONLY__`](https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_preprocessor_directives_and_macros))
which require the split to happen at the preprocessor stage.
I don't know whether CUDA-Fortran has a `__CUDA_ARCH__` preprocessor definition
or similar which could be used to compile very different sources for host and
devices.
https://github.com/llvm/llvm-project/pull/200863
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits