================
@@ -22,15 +22,19 @@ using namespace ompx;
 
 namespace {
 
-void gpu_regular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct) {
+[[clang::always_inline]]
+static void gpu_regular_warp_reduce(void *reduce_data,
----------------
ro-i wrote:

> [...] I generally do think that needing to add always inline represents a 
> problem with our codegen or inliner heuristics.

The issue is that we didn't want to have actual calls to the indirect functions 
passed to the reduction runtime.
Afaiu:
We don't restrict the specialization done by the attributor in OpenMPOpt. 
(Introduced by 
https://github.com/llvm/llvm-project/commit/9c08e76f3e5f2f3e8cb1e3c9fd45827395c712cc.)
We have each function such as `cpyFct`, `shflFct`, etc. once for every 
reduction in a translation unit. They are not shared because the data types 
could differ and they aren't cached or sth. Then, in the *not-inlined* case, 
code that uses these functions has all of their bodies inlined and then 
basically uses the function pointers in a switch to determine which of the 
inlined snippets it should execute.
All this is solved by inlining the reduction functions themselves, because then 
we have one instance per reduction, which means that there is no need to switch 
over the function pointers of the helper functions because there is now a 1:1 
mapping between the helper functions and the functions that use them.

https://github.com/llvm/llvm-project/pull/196061
_______________________________________________
llvm-branch-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to