[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

John McCall via Phabricator via cfe-commits Thu, 23 Apr 2020 16:53:32 -0700

rjmccall added inline comments.


================
Comment at: clang/lib/Sema/SemaOverload.cpp:9749
+  if (isBetterMultiversionCandidate(Cand1, Cand2))
+    return true;
+
----------------
erichkeane wrote:
> yaxunl wrote:
> > echristo wrote:
> > > rjmccall wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > If we move anything below this check, it needs to figure out a 
> > > > > > tri-state so that it can return false if `Cand2` is a better 
> > > > > > candidate than `Cand1`.  Now, that only matters if multiversion 
> > > > > > functions are supported under CUDA, but if you're relying on them 
> > > > > > not being supported, that should at least be commented on.
> > > > > multiversion host functions is orthogonal to CUDA therefore should be 
> > > > > supported. multiversion in device, host device, and global functions 
> > > > > are not supported. However this change does not make things worse, 
> > > > > and should continue to work if they are supported.
> > > > > 
> > > > > host/device based overloading resolution is mostly for determining 
> > > > > viability of a function. If two functions are both viable, other 
> > > > > factors should take precedence in preference. This general rule has 
> > > > > been taken for cases other than multiversion, I think it should also 
> > > > > apply to multiversion.
> > > > > 
> > > > > I will make isBetterMultiversionCandidate three states.
> > > > > This general rule has been taken for cases other than multiversion, I 
> > > > > think it should also apply to multiversion.
> > > > 
> > > > Well, but the multiversion people could say the same: that 
> > > > multiversioning is for picking an alternative among otherwise-identical 
> > > > functions, and HD and H functions are not otherwise-identical.
> > > > 
> > > > CC'ing @echristo for his thoughts on the right ordering here.
> > > Adding @erichkeane here as well.
> > > 
> > > I think this makes sense, but I can see a reason to multiversion a 
> > > function that will run on host and device. A version of some matrix mult 
> > > that takes advantage of 3 host architectures and one cuda one? Am I 
> > > missing something here?
> > My understanding is that a multiversion function is for a specific 
> > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, 
> > sandybridge, ivybridge, shouldn't they be more like
> > 
> > ```
> > __host__ __attribute__((cpu_specific(sandybridge))) f();
> > __host__ __attribute__((cpu_specific(ivybridge))) f();
> > __device__ __attribute__((cpu_specific(gfx900))) f();
> > __device__ __attribute__((cpu_specific(gfx906))) f();
> > ```
> > instead of all `__device__ __host__` functions?
> IMO, it doesn't make sense for functions to functions be BOTH host and 
> device, they'd have to be just one.  Otherwise I'm not sure how the resolver 
> behavior is supposed to work.  The whole idea is that the definition is 
> chosen at runtime.
> 
> Unless __host__ __device void foo(); is TWO declaration chains (meaning two 
> separate AST entries), it doesn't make sense to have multiverison on it (and 
> then, how it would be spelled is awkward/confusing to me).
> 
> In the above case, if those 4 declarations are not 2 separate root- AST 
> nodes, multiversioning won't work.
There are certainly functions that ought to be usable from either host or 
device context — any inline function that just does ordinary language things 
should be in that category.  Also IIUC many declarations are *inferred* to be 
`__host__ __device__`, or can be mass-annotated with pragmas, and those reasons 
are probably the main ones this might matter — we might include a header in 
CUDA mode that declares a multi-versioned function, and we should handle it 
right.

My read of how CUDA programmers expect this to work is that they see the 
`__host__` / `__device__` attributes as primarily a mechanism for catching 
problems where you're using the wrong functions for the current configuration.  
That is, while we allow overloading by `__host__`/`__device__`-ness, users 
expect those attributes to mostly be used as a filter for what's "really there" 
rather than really strictly segregating the namespace.  So I would say that 
CUDA programmers would probably expect the interaction with multiversioning to 
be:

- Programmers can put `__host__`, `__device__`, or both on a variant depending 
on where it was usable.
- Dispatches should simply ignore any variants that aren't usable for the 
current configuration.

And specifically they would not expect e.g. a `__host__` dispatch function to 
only consider `__host__` variants — it should be able to dispatch to anything 
available, which is to say, it should also include `__host__ __device__` 
variants.  Similarly (and probably more usefully), a `__host__ __device__` 
dispatch function being compiled for the device should also consider pure 
`__device__` functions, and so on.

If we accept that, then I think it gives us a much better idea for how to 
resolve the priority of the overload rules.  The main impact of 
`isBetterMultiversionCandidate` is to try to ensure that we're looking at the 
`__attribute__((cpu_dispatch))` function instead of one of the 
`__attribute__((cpu_specific))` variants.  (It has no effect on 
`__attribute__((target))` multi-versioning, mostly because it doesn't need to: 
target-specific variants don't show up in lookup with 
`__attribute__((target))`.)  That rule should take precedence over the CUDA 
preference for exact matches, because e.g. if we're compiling this:

```
__host__ __device__ int magic(void) __attribute__((cpu_dispatch("...")));
__host__ __device__ int magic(void) __attribute__((cpu_specific(generic)));
__host__ int magic(void) __attribute__((cpu_specific(mmx)));
__host__ int magic(void) __attribute__((cpu_specific(sse)));
__device__ int magic(void) __attribute__((cpu_specific(some_device_feature)));
__device__ int magic(void) 
__attribute__((cpu_specific(some_other_device_feature)));
```

then we don't want the compiler to prefer a CPU-specific variant over the 
dispatch function just because one of the variant was marked `__host__`.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77954/new/

https://reviews.llvm.org/D77954



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D77954: [CUDA][HIP] Fix host/device based overload resolution

Reply via email to