================
@@ -176,3 +176,34 @@ Predefined Macros
    * - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
      - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 
+C++20 Concepts with HIP and CUDA
+--------------------------------
+
+In Clang, when working with HIP or CUDA, it's important to note that all 
constraints in C++20 concepts are assumed to be for the host side only. This 
behavior is consistent across both programming models, and developers should be 
aware of this assumption when writing code that utilizes C++20 concepts.
+
+Example:
+.. code-block:: c++
+
+   template <class T>
+   concept MyConcept = requires(T& obj) {
+     my_function(obj);  // Assumed to be a host-side requirement
----------------
zygoloid wrote:

If I understand correctly, normally a template is usable from either host or 
device (depending on whether it ends up calling any host-only or device-only 
function). This choice for concepts seems like it's going to be problematic for 
that model. Something as simple as:

```c++
template<Copyable T> T f(T x) { return x; }
```

... should really be callable on the host or device side if `T` is copyable on 
the host or device side, and using the host side in all cases will mean that 
things like the C++ `<complex>` or `<functional>` header may stop doing the 
right thing in some cases if/when they get extended to use concepts. And it 
seems like with this patch there's not anything that the authors of those 
headers can really do about it.

Perhaps it would be better for the host/device choice in a concept satisfaction 
check to depend on the context in which the concept is required to be satisfied 
(which I would imagine is what happened by chance before this patch), and for 
us to include the CUDA context as part of the constraint satisfaction cache 
key? That kind of direction seems like it'd give closer results to what we'd 
get from the split compilation model. I don't know if that actually works in 
general, though. For example, given:

```c++
__host__ X<T> host_global;
__device__ X<T> device_global;
```

... where `X` is a constrained template, what seems like it should happen here 
is that we take the `__host__` / `__device__` into account when 
concept-checking `X`'s template arguments, but I'd worry that we don't have the 
host/device information to hand when checking the concept satisfaction query 
for `X`.

More broadly, I think there'll be cases where a CUDA developer will want, from 
host code, to check whether a constraint would be satisfied on the device, and 
some mechanism for doing that seems useful. I think that *can* be done with the 
model I suggest above, by putting a kernel call inside a `requires` expression, 
but it seems awkward, so perhaps some syntax for explicitly evaluating a 
*concept-id* in a particular host/device would be useful.

But it definitely seems worthwhile to figure out what rule NVCC is using here.

https://github.com/llvm/llvm-project/pull/67721
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to