rjmccall added inline comments.
================ Comment at: clang/lib/Sema/SemaOverload.cpp:9749 + if (isBetterMultiversionCandidate(Cand1, Cand2)) + return true; + ---------------- erichkeane wrote: > yaxunl wrote: > > echristo wrote: > > > rjmccall wrote: > > > > yaxunl wrote: > > > > > rjmccall wrote: > > > > > > If we move anything below this check, it needs to figure out a > > > > > > tri-state so that it can return false if `Cand2` is a better > > > > > > candidate than `Cand1`. Now, that only matters if multiversion > > > > > > functions are supported under CUDA, but if you're relying on them > > > > > > not being supported, that should at least be commented on. > > > > > multiversion host functions is orthogonal to CUDA therefore should be > > > > > supported. multiversion in device, host device, and global functions > > > > > are not supported. However this change does not make things worse, > > > > > and should continue to work if they are supported. > > > > > > > > > > host/device based overloading resolution is mostly for determining > > > > > viability of a function. If two functions are both viable, other > > > > > factors should take precedence in preference. This general rule has > > > > > been taken for cases other than multiversion, I think it should also > > > > > apply to multiversion. > > > > > > > > > > I will make isBetterMultiversionCandidate three states. > > > > > This general rule has been taken for cases other than multiversion, I > > > > > think it should also apply to multiversion. > > > > > > > > Well, but the multiversion people could say the same: that > > > > multiversioning is for picking an alternative among otherwise-identical > > > > functions, and HD and H functions are not otherwise-identical. > > > > > > > > CC'ing @echristo for his thoughts on the right ordering here. > > > Adding @erichkeane here as well. > > > > > > I think this makes sense, but I can see a reason to multiversion a > > > function that will run on host and device. A version of some matrix mult > > > that takes advantage of 3 host architectures and one cuda one? Am I > > > missing something here? > > My understanding is that a multiversion function is for a specific > > cpu(gpu). Let's say we want to have a function f for gfx900, gfx906, > > sandybridge, ivybridge, shouldn't they be more like > > > > ``` > > __host__ __attribute__((cpu_specific(sandybridge))) f(); > > __host__ __attribute__((cpu_specific(ivybridge))) f(); > > __device__ __attribute__((cpu_specific(gfx900))) f(); > > __device__ __attribute__((cpu_specific(gfx906))) f(); > > ``` > > instead of all `__device__ __host__` functions? > IMO, it doesn't make sense for functions to functions be BOTH host and > device, they'd have to be just one. Otherwise I'm not sure how the resolver > behavior is supposed to work. The whole idea is that the definition is > chosen at runtime. > > Unless __host__ __device void foo(); is TWO declaration chains (meaning two > separate AST entries), it doesn't make sense to have multiverison on it (and > then, how it would be spelled is awkward/confusing to me). > > In the above case, if those 4 declarations are not 2 separate root- AST > nodes, multiversioning won't work. There are certainly functions that ought to be usable from either host or device context — any inline function that just does ordinary language things should be in that category. Also IIUC many declarations are *inferred* to be `__host__ __device__`, or can be mass-annotated with pragmas, and those reasons are probably the main ones this might matter — we might include a header in CUDA mode that declares a multi-versioned function, and we should handle it right. My read of how CUDA programmers expect this to work is that they see the `__host__` / `__device__` attributes as primarily a mechanism for catching problems where you're using the wrong functions for the current configuration. That is, while we allow overloading by `__host__`/`__device__`-ness, users expect those attributes to mostly be used as a filter for what's "really there" rather than really strictly segregating the namespace. So I would say that CUDA programmers would probably expect the interaction with multiversioning to be: - Programmers can put `__host__`, `__device__`, or both on a variant depending on where it was usable. - Dispatches should simply ignore any variants that aren't usable for the current configuration. And specifically they would not expect e.g. a `__host__` dispatch function to only consider `__host__` variants — it should be able to dispatch to anything available, which is to say, it should also include `__host__ __device__` variants. Similarly (and probably more usefully), a `__host__ __device__` dispatch function being compiled for the device should also consider pure `__device__` functions, and so on. If we accept that, then I think it gives us a much better idea for how to resolve the priority of the overload rules. The main impact of `isBetterMultiversionCandidate` is to try to ensure that we're looking at the `__attribute__((cpu_dispatch))` function instead of one of the `__attribute__((cpu_specific))` variants. (It has no effect on `__attribute__((target))` multi-versioning, mostly because it doesn't need to: target-specific variants don't show up in lookup with `__attribute__((target))`.) That rule should take precedence over the CUDA preference for exact matches, because e.g. if we're compiling this: ``` __host__ __device__ int magic(void) __attribute__((cpu_dispatch("..."))); __host__ __device__ int magic(void) __attribute__((cpu_specific(generic))); __host__ int magic(void) __attribute__((cpu_specific(mmx))); __host__ int magic(void) __attribute__((cpu_specific(sse))); __device__ int magic(void) __attribute__((cpu_specific(some_device_feature))); __device__ int magic(void) __attribute__((cpu_specific(some_other_device_feature))); ``` then we don't want the compiler to prefer a CPU-specific variant over the dispatch function just because one of the variant was marked `__host__`. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D77954/new/ https://reviews.llvm.org/D77954 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits