[clang] [llvm] [clang][OpenMP] Improve loop structure for distributed loops (PR #201670)

Robert Imschweiler via cfe-commits Mon, 08 Jun 2026 11:57:34 -0700

ro-i wrote:

I realized that I had a bit of a testing issue. From my reduction tests, I kept 
the result verification for every test run (because I always wanted to have 
more testing guards against race conditions etc). But in my new non-reduction 
test cases, that hurts testing speed because the checks are O(n). Due to that 
fact, I previously only tested small N (4,096 or 65,535) instead of my usual 
default 177,777,777.


With that N, we get the following perf change for **non-reduction** workloads:
```
misc_stencil           double   change for 208 teams:   +47.45%   change for 
10400 teams:   -30.69%
misc_elem_func         double   change for 208 teams:   +42.02%   change for 
10400 teams:   +63.05%
misc_elem_loop         double   change for 208 teams:   +32.79%   change for 
10400 teams:   -19.86%
misc_linalg            double   change for 208 teams:   +31.96%   change for 
10400 teams:   -20.79%
misc_particle          double   change for 208 teams:   +13.89%   change for 
10400 teams:    +3.09%
misc_stencil           uint     change for 208 teams:   +36.12%   change for 
10400 teams:    -0.79%
misc_elem_func         uint     change for 208 teams:  +117.16%   change for 
10400 teams:   +16.39%
misc_elem_loop         uint     change for 208 teams:   +37.26%   change for 
10400 teams:   +26.12%
misc_linalg            uint     change for 208 teams:   +36.46%   change for 
10400 teams:   +23.19%
misc_particle          uint     change for 208 teams:   +10.88%   change for 
10400 teams:    -0.22%
misc_stencil           ulong    change for 208 teams:   +45.55%   change for 
10400 teams:   -31.35%
misc_elem_func         ulong    change for 208 teams:   +39.18%   change for 
10400 teams:   +66.38%
misc_elem_loop         ulong    change for 208 teams:   +38.42%   change for 
10400 teams:   -23.92%
misc_linalg            ulong    change for 208 teams:   +37.38%   change for 
10400 teams:   -24.18%
misc_particle          ulong    change for 208 teams:   +13.16%   change for 
10400 teams:    +1.76%
misc_stencil           Value    change for 208 teams:    -3.31%   change for 
10400 teams:    -0.76%
misc_elem_func         Value    change for 208 teams:    +0.55%   change for 
10400 teams:    +1.42%
misc_elem_loop         Value    change for 208 teams:    -1.26%   change for 
10400 teams:    -2.87%
misc_linalg            Value    change for 208 teams:    -0.73%   change for 
10400 teams:   -15.36%
misc_particle          Value    change for 208 teams:    -1.15%   change for 
10400 teams:    -0.12%
``` 

There is probably potential, but I'll change this PR to only handle the 
reduction cases for now. The other cases would need more analysis to get the 
most out of it and I need to focus on cross-team reduction for the moment.

https://github.com/llvm/llvm-project/pull/201670
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [clang][OpenMP] Improve loop structure for distributed loops (PR #201670)

Reply via email to