[PATCH] D71241: [OpenMP][WIP] Use overload centric declare variants

Johannes Doerfert via Phabricator via cfe-commits Thu, 12 Dec 2019 11:44:44 -0800

jdoerfert added a comment.

In D71241#1782173 <https://reviews.llvm.org/D71241#1782173>, @ABataev wrote:


> In D71241#1782157 <https://reviews.llvm.org/D71241#1782157>, @jdoerfert wrote:
>
> > In D71241#1781955 <https://reviews.llvm.org/D71241#1781955>, @ABataev wrote:
> >
> > > In D71241#1780715 <https://reviews.llvm.org/D71241#1780715>, @jdoerfert 
> > > wrote:
> > >
> > > > In D71241#1779097 <https://reviews.llvm.org/D71241#1779097>, @ABataev 
> > > > wrote:
> > > >
> > > > > In D71241#1778963 <https://reviews.llvm.org/D71241#1778963>, 
> > > > > @jdoerfert wrote:
> > > > >
> > > > > > In D71241#1778736 <https://reviews.llvm.org/D71241#1778736>, 
> > > > > > @ABataev wrote:
> > > > > >
> > > > > > > In D71241#1778717 <https://reviews.llvm.org/D71241#1778717>, 
> > > > > > > @jdoerfert wrote:
> > > > > > >
> > > > > > > > >> There is no evidence that this is more complicated. By all 
> > > > > > > > >> measures, this is less complicated (see also below). It is 
> > > > > > > > >> also actually doing the right thing when it comes to code 
> > > > > > > > >> emission. Take https://godbolt.org/z/sJiP3B for example. The 
> > > > > > > > >> calls are wrong and the definition of base is missing.
> > > > > > > > > 
> > > > > > > > > How did you measure it? I have a completely different 
> > > > > > > > > opinion. Also, tried to reproduce the problem locally, could 
> > > > > > > > > not reproduce. It seems to me, the output of the test misses 
> > > > > > > > > several important things. You can check it yourself. 
> > > > > > > > > `tgt_target_teams()` call uses `@.offload_maptypes` global 
> > > > > > > > > var but it is not defined.
> > > > > > > >
> > > > > > > > Here is the link with the globals not hidden: 
> > > > > > > > https://godbolt.org/z/5etB5S
> > > > > > > >  The base function is called both times but should not be 
> > > > > > > > called at all. What is your local output and why does it differ?
> > > > > > >
> > > > > > >
> > > > > > > On the host `base` is an alias for the `hst` function. On the 
> > > > > > > device `base` has the body of `dev` function because NVPTX does 
> > > > > > > nit support function aliases (10+ suppprts it, but LLVM does not 
> > > > > > > support it yet). Try to change the bodies of dev and hst and you 
> > > > > > > will see.
> > > > > > >
> > > > > > > I tried to keep original function names to improve debugging and 
> > > > > > > make users less wonder why instead of `base` something else is 
> > > > > > > called.
> > > > > >
> > > > > >
> > > > > > How does that work with linking? Another translation unit can call 
> > > > > > both the base and hst/dev function, right? I mean they both need to 
> > > > > > be present.
> > > > >
> > > > >
> > > > > If hst or dev are needed, they are emitted too, independently. On the 
> > > > > host, the variamt function is emitted and base function is set to be 
> > > > > an alias of the variant function. On the device we just inherit the 
> > > > > body of the variant function, but this variant function also can be 
> > > > > emitted, if used independently.
> > > >
> > > >
> > > > This is confusing:
> > > >
> > > > 1. Especially for debugging we should do the same on host and device.
> > >
> > >
> > > We do the same on the host and on the device. The user will see that he 
> > > is the context of originally called function `base`. It is the same 
> > > behavior just like the behavior of the GNU gcc `alias` attribute.
> >
> >
> > No we do not. You said it yourself just in the last comment: one time you 
> > emit an alias (host), one time you "emit code into the base function" 
> > (device).
> >  That is not the same thing, especially if you look at debug information, 
> > stack frames, ...
>
>
> Yes, formally it is not the same. But only because NVPTX does not support 
> function aliasing.


Long story short, code generation (and debugging) is different in the existing 
approach as a workaround.

>> 
>> 
>>>> 2. The device function now exists twice, that is bad.
>>> 
>>> It is not so bad, actually. In your solution, you have almost the same 
>>> situation. Plus, it is not a big problem taking into account that most of 
>>> the function calls will be inlined and function will be eliminated. As soon 
>>> as we have support for global aliases in LLVM/NVPTX, we can improve it.
>> 
>> It is bad to duplicate code for no reason. The SemaOverload solution has not 
>> "almost the same situation", as it does not replace the base function body 
>> with the variant function body.
> 
> Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

>> 
>> 
>>>> 3. How is this supposed to work with type corrections? I mean the variant 
>>>> needs to be compatible but not necessarily of the same type, right? 
>>>> https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it 
>>>> crashes for me afterwards).
>>> 
>>> Actually, these functions are not compatible. We're missing some extra 
>>> checks in Sema, will add them later.
>> 
>> That is not necessarily clear to me. The standard does not say the types 
>> need to be equal but compatible.
> 
> Compatible in terms of the base language. In C, these functions are not 
> compatible. Try to declare the function `base` as `int base(float)` and then 
> try to redeclare it as `int base(int)`. You will get the error because the 
> types are not compatible. For example, `int ()` and `int(int)` are compatible 
> though are not equal.

The problem exists for C++ with short vs int as well: 
https://godbolt.org/z/wYB_pX

>> 
>> 
>>>> 4. On the host the expression `&base == &hst` will evaluate to true with 
>>>> the alias.
>>> 
>>> And what is the problem with this? I believe, with your solution the result 
>>> will be the same because you will just replace `base` with the `hst` upon 
>>> overloading resolution.
>> 
>> But I can do it for calls only. Aliases are not as fine granular. That is 
>> also why the current code cannot handle the `construct` trait but an 
>> overload solution can.
> 
> I see no problem here.

The current approach *cannot* be used to implement `construct` without adding a 
*third* way to handle declare variants. So we have aliases (no 1), if supported 
by the architecture and if the trait is not `construct`. We have "function 
hijacking" (no 2), if aliases are not supported and if the trait is not 
`construct`. We will need something else if the trait is `construct` (no 3). 
The alternative is to handle *all of it* during overload resolution. If you 
still believe it is better to modify the IR in various ways *after* we created 
it, I'll move this discussion on the openmp-dev list to unblock it.

>>>> In D71241#1779779 <https://reviews.llvm.org/D71241#1779779>, @ABataev 
>>>> wrote:
>>>> 
>>>>> Here is the example that does not work with the proposed solution but 
>>>>> works with the existing one:
>>>>>
>>>>>   static void cpu() { asm("nop"); }
>>>>>  
>>>>>   #pragma omp declare variant(cpu) match(device = {kind(cpu)})
>>>>>   static __attribute__((used)) void wrong_asm() {
>>>>>     asm ("xxx");
>>>>>   }
>>>>>
>>>>>
>>>>> The existing solution has some problems with the delayed error messages 
>>>>> too, but they are very easy to fix.
>>>> 
>>>> 
>>>> I don't understand this example. What is the expected outcome here (I get 
>>>> the error below from ToT clang). Why is that not achievable by a 
>>>> SemaOverload solution?
>>>>  `asm.c:4:35: error: function with '#pragma omp declare variant' must have 
>>>> a prototype`
>>> 
>>> Try to compile as C++:
>>> 
>>>   clang++ -с -fopenmp repro.cpp
>> 
>> You did not answer any of the questions here. If you ignore my comments this 
>> is not working.
>>  What is the expected outcome here?  Why is that not achievable by a 
>> SemaOverload solution?
> 
> Expected result - the code is compiled successfully. Your solution will 
> produce the error message for incorrect asm. SemaOverload won't help you with 
> this. Assume, you have a base function with the target-specific code for the 
> host and a variant with the target-specific code on the device.

I compiled it as cpp with both approaches just fine (ignoring the math errors 
due to the wrong order of things because I had the begin/end declare variant 
patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?
In fact, the current solution will disregard the `used` attribute here and not 
emit the function, which is bad. The Sema solution will dispatch the right 
calls and honor the `used` attribute properly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71241/new/

https://reviews.llvm.org/D71241



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D71241: [OpenMP][WIP] Use overload centric declare variants

Reply via email to