jdoerfert added a comment.

In D60907#1484529 <https://reviews.llvm.org/D60907#1484529>, @hfinkel wrote:

> In D60907#1483660 <https://reviews.llvm.org/D60907#1483660>, @jdoerfert wrote:
>
> > In D60907#1483615 <https://reviews.llvm.org/D60907#1483615>, @hfinkel wrote:
> >
> > > In D60907#1479370 <https://reviews.llvm.org/D60907#1479370>, @gtbercea 
> > > wrote:
> > >
> > > > In D60907#1479142 <https://reviews.llvm.org/D60907#1479142>, @hfinkel 
> > > > wrote:
> > > >
> > > > > In D60907#1479118 <https://reviews.llvm.org/D60907#1479118>, 
> > > > > @gtbercea wrote:
> > > > >
> > > > > > Ping @hfinkel @tra
> > > > >
> > > > >
> > > > > The last two comments in D47849 <https://reviews.llvm.org/D47849> 
> > > > > indicated exploration of a different approach, and one which still 
> > > > > seems superior to this one. Can you please comment on why you're now 
> > > > > pursuing this approach instead?
> > > >
> > > >
> > > > ...
> > > >
> > > > Hal, as far as I can tell, this solution is similar to yours but with a 
> > > > slightly different implementation. If there are particular aspects 
> > > > about this patch you would like to discuss/give feedback on please let 
> > > > me know.
> > >
> > >
> > > The solution I suggested had the advantages of:
> > >
> > > 1. Being able to directly reuse the code in 
> > > `__clang_cuda_device_functions.h`. On the other hand, using this solution 
> > > we need to implement a wrapper function for every math function. When 
> > > `__clang_cuda_device_functions.h` is updated, we need to update the 
> > > OpenMP wrapper as well.
> >
> >
> > I'd even go as far as to argue that `__clang_cuda_device_functions.h` 
> > should include the internal math.h wrapper to get all math functions. See 
> > also the next comment.
> >
> > > 2. Providing access to wrappers for other CUDA intrinsics in a natural 
> > > way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of 
> > > rnorm3d than __nv_rnorm3d in user code].
> >
> > @hfinkel 
> >  I don't see why you want to mix CUDA intrinsics with math.h overloads.
>
>
> What I had in mind was matching non-standard functions in a standard way. For 
> example, let's just say that I have a CUDA kernel that uses the rnorm3d 
> function, or I otherwise have a function that I'd like to write in OpenMP 
> that will make good use of this CUDA function (because it happens to have an 
> efficient device implementation). This is a function that CUDA provides, in 
> the global namespace, although it's not standard.
>
> Then I can do something like this (depending on how we setup the 
> implementation):
>
>   double rnorm3d(double a,  double b, double c) {
>     return sqrt(a*a + b*b + c*c);
>   }
>   
>   ...
>   
>   #pragma omp target
>   {
>     double a = ..., b = ..., c = ...;
>     double r = rnorm3d(a, b, c)
>   }
>   
>
> and, if we use the CUDA math headers for CUDA math-function support, than 
> this might "just work." To be clear, I can see an argument for having this 
> work being a bad idea ;) -- but it has the advantage of providing a way to 
> take advantage of system-specific functions while still writing 
> completely-portable code.


Matching `rnorm3d` and replacing it with some nvvm "intrinsic" is something I 
wouldn't like to see happening if `math.h` was included and not if it was not. 
As you say, in Cuda that is not how it works either. I'm in favor of reusing 
the built-in recognition mechanism:
That is, if the target is nvptx, the name is rnorm3d, we match that name and 
use the appropriate intrinsic, as we do others already for other targets.

>>   I added a rough outline of how I imagined the internal math.h header to 
>> look like as a comment in D47849. Could you elaborate how that differs from 
>> what you imagine and how the other intrinsics come in?
> 
> That looks like what I had in mind (including 
> `__clang_cuda_device_functions.h` to get the device functions.)
> 
>> 
>> 
>>> 3. Being similar to the "declare variant" functionality from OpenMP 5, and 
>>> thus, I suspect, closer to the solution we'll eventually be able to apply 
>>> in a standard way to all targets.
>> 
>> I can see this.
>> 
>>>> This solution is following Alexey's suggestions. This solution allows the 
>>>> optimization of math calls if they apply (example: pow(x,2) => x*x ) which 
>>>> was one of the issues in the previous solution I implemented.
>>> 
>>> So we're also missing that optimization for CUDA code when compiling with 
>>> Clang? Isn't this also something that, regardless, should be fixed?
>> 
>> Maybe through a general built-in recognition and lowering into target 
>> specific implementations/intrinsics late again?
> 
> I suspect that we need to match the intrinsics and perform the optimizations 
> in LLVM at that level in order to get the optimizations for CUDA.

That seems reasonable to me. We could also match other intrinsics, e.g., 
`rnorm3d`, here as well, both by name but also by the computation pattern.

In D60907#1484643 <https://reviews.llvm.org/D60907#1484643>, @tra wrote:

> +1 to Hal's comments.
>
> @jdoerfert :
>
> > I'd even go as far as to argue that __clang_cuda_device_functions.h should 
> > include the internal math.h wrapper to get all math functions. See also the 
> > next comment.
>
> I'd argue other way around -- include __clang_cuda_device_functions.h from 
> math.h and do not preinclude anything.
>  If the user does not include math.h, it should not have its namespace 
> polluted by some random stuff. NVCC did this, but that's one of the most 
> annoying 'features' we have to be compatible with for the sake of keeping 
> existing nvcc-compilable CUDA code happy.
>
> If users do include math.h, it should do the right thing, for both sides of 
> the compilation.
>  IMO It's math.h that should be triggering pulling device functions in.


I actually don't want to preinclude anything and my arguments are (mostly) for 
the OpenMP offloading code path not necessarily Cuda.
Maybe to clarify, what I want is:

1. Make sure the `clang/Headers/math.h` is found first if `math.h` is included.
2. Use a scheme similar to the one described 
https://reviews.llvm.org/D47849#1483653 in `clang/Headers/math.h`
3. Only add `math.h` function overloads in our `math.h`. **<- This is 
debatable**
4. Include `clang/Headers/math.h` from `__clang_cuda_device_functions.h` to 
avoid duplication of math function declarations.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60907/new/

https://reviews.llvm.org/D60907



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to