tra added a comment.

The key difference between C++ and CUDA/HIP, as implemented in clang, is that 
`__host__` and `__device__` attributes are considered during function 
overloading in CUDA and HIP, so `__host__ void foo()`, `__device__ void foo()` 
and `__host__ __device__ void foo()` are three different functions and not 
redeclarations of the same function. Details of the original proposal are here: 
 https://goo.gl/EXnymm.

In D102507#2838776 <https://reviews.llvm.org/D102507#2838776>, @yaxunl wrote:

> 2. Some libc++ functions are mostly common for device or host with minor 
> differences. For such functions, we can make them `__device__ __host__` and 
> use `#if __HIP_DEVICE_COMPILE__` (indicating device compilation) for the 
> minor difference in the function body.

I think we should rely on target overloading when possible, instead of the 
preprocessor. Minimizing the differences between the code seen by compiler 
during host and device side compilation will minimize potential issues.
Which approach we'll end up using is an implementation detail.

> 3. Some libc++ functions have different implementations for device and host. 
> We can leave these host functions as they are and adding overloaded 
> `__device__` functions.
>
> There are two ways to mark libc++ functions as `__device__ __host__`:
>
> 1. Define a macro which expands to empty for non-HIP programs and expands to 
> `__device__ __host__` for HIP and add it to each libc++ function which is to 
> be marked as `__device__ __host__`.

One caveat of the overloading based on target attributes is that we can't 
re-declare a function with `__device__ __host__` as compiler will see attempted 
redeclaration as a function overload of a function w/o attributes (implicitly 
`__host__`).

> 2. Define macros which expand to empty for non-HIP programs and expand to 
> `#pragma clang force_cuda_host_device begin/end` for HIP and put them at the 
> beginning and end of a file where all the functions are to be marked as 
> `__device__ __host__`.
>
> We plan to implement libc++ support in HIP device compilation in a 
> progressive approach, header by header, and document the supported libc++ 
> headers. We will prioritize libc++ headers to support based on 1) user 
> requests 2) whether it has already been supported through clang wrapper 
> headers (patching) 4) usefulness for device execution 3) availability of 
> lower level support with HIP runtime.

All of the above applies to CUDA, modulo the macro names and some differences 
in the builtins and the the functions provided (or not) by runtime on the GPU 
side.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102507/new/

https://reviews.llvm.org/D102507

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to