loopacino wrote:

> Please create separate PRs for Clang and MLIR. These are separate 
> implementations don't need to be reviewed together. I am focusing on Clang in 
> this PR.
> 
> None of the added test are actually requiring the intratile loops to be 
> canonical. That is, something where an surrounding loop-associated directives 
> affects a grid/floor loop as well as a intratile loop at the same time. For 
> instance:
> 
> ```c++
> #pragma omp for collapse(2)
> #pragma omp tile size(2)
> for (int i = 0; i < n; ++i) 
>    ...
> ```
> 
> or two-level tiling:
> 
> ```c++
> #pragma omp tile sizes(3, 5)
> #pragma omp tile size(2)
> for (int i = 0; i < n; ++i) 
>    ...
> ```
> 
> `IntraTileMustBeCanonical` is an ok-ish first heuristic, a refinement would 
> be to which of the tile loops are actually affected and thus would need to be 
> canonical. For instance:
> 
> ```c++
> #pragma omp for collapse(3)
> #pragma omp tile size(2, 3)
> for (int i = 0; i < n; ++i) 
>   for (int j = 0; j < n; ++j) 
>    ...
> ```
> 
> here, the intratile-loop for `j` does not need to be canonical, but `i` does. 
> Because as you found out emitting the canonical version of the loop inhibits 
> vectorization, it would be quite important to eventually have this.
> 
> Because Clang implements loop-transformations using AST transforms, I think 
> it would be easier and more accurate if we always emit the canonical form, 
> but emit a marker that allows to optimize the canonical form to the 
> non-canonical one later, as suggested 
> [here](https://github.com/llvm/llvm-project/pull/191114#issuecomment-4313291974).
>  For instance, it could also emit an alternatve AST depending on the 
> situation we are in, like 
> `OMPCanonicalLoopNestTransformationDirective::getTransformed()` does. Imagine 
> such an AST:
> 
> ```
> OMPLoopAlternative
> `- getCanonical(): ForStmt // Predicated version when canonical version is 
> needed
>    `- for (int i = 0; i < n; ++)
>       `- if (i < .floor.iv+tilesize)
> `- getOptimized(): ForStmt // Vectorizable version when not used as an 
> affected loop
>    `- for (int i = 0; i < min(n, .floor.iv+tilesize; ++)
> ```
> 
> Storing both versions at the same time might not be the best solution due to 
> the overhead of storing both. Instance, it could just add a marker (such as 
> `AttributedStmt`) to the loop which says that the ForStmt can be emitted as 
> the optimized one at CodeGen, i.e.:
> 
> ```
> for (int i = 0; i < n; ++)
> `- AttributedStmt __attribute__((__invariant_pred_bound)) // States that the 
> RHS is invariant to the loop using the LHS as IV
>     `- if (i < .floor.iv+tilesize)
>         `- ...
> ```
> 
> is emitted as
> 
> ```
> for (int i = 0; i < min(n,  .floor.iv+tilesize); ++)
> `- ...
> ```
> 
> I think both approaches are viable, but we don't need to start with a 
> `IntraTileMustBeCanonical` if eventually we are going to do the second. Which 
> approach would you prefer?
> 
> When I try this out, I get 3 failing tests:
> 
> ```
> Failed Tests (3):
>   libomp :: transform/tile/foreach.cpp
>   libomp :: transform/tile/intfor.c
>   libomp :: transform/tile/iterfor.cpp
> ```
> 
> Test updates in this PR are adding more iterator evaluations, which assume is 
> because mostly because fewer invariant values are precomputed? Ideally, If 
> `IntraTileMustBeCanonical` is false it generates the same code. This should 
> actually be easier to do with an `IntraTileMustBeCanonical` heursitic than 
> with the marker approach.

1. Reverted MLIR changes
2. As suggested, went with the marker-based approach.
3. All tests pass now.

https://github.com/llvm/llvm-project/pull/191114
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to