skatrak wrote:

> > Maybe support for this operation could be just based on changes to how the 
> > MLIR representation is built in the first place, what do you think?
> 
> This is partly what this implementation aims to do. In fact, after the pass 
> that lowers the omp.workshare operation we are left with IR very close to the 
> one you showed in your example. (please take a look at some of the tests in 
> #101446)
> 
> The approach taken here is similar to the omp.workdistribute implementation, 
> in that the purpose of the omp.workshare and omp.workshare.loop_wrapper ops 
> are to preserve the high-level optimizations available when using HLFIR, 
> after we are done with the LowerWorkshare pass, both omp.workdistribute and 
> omp.workdistribute.loop_wrapper disappear.
> 

I see that this approach reduces significantly the amount of OpenMP-specific 
handling that needs to be done inside of the creation of Fortran loops, so I 
don't have an issue with adding the `omp.workshare` and `omp.workdistribute` 
ops and creating a late-running transformation pass rather than generating 
directly the "lower-level" set of OpenMP operations that represent the 
semantics of these constructs.

> The sole purpose of the omp.workdistribute.loop_wrapper op is to be able to 
> more explicitly mark loops that need to "parallelized" by the workshare 
> construct and preserve that information through the pipeline. Its lifetime is 
> from the frontend (Fortran->{HLFIR,FIR}) up to the the LowerWorkshare pass 
> which runs after we are done with HLFIR optimizations (after HLFIR->FIR 
> lowering), same for omp.workshare.
> 
> The problem with trying to convert fir.do_loop's to wsloop is that it is 
> harder to keep track of where they came from - did they come from an array 
> intrinsic which needs to be parallelized or was it just a do loop which the 
> programmer wrote in the workshare which must not be parallelized.

I guess what I still don't understand is the need for `omp.work{share, 
distribute}.loop_wrapper` operations. For telling apart a sequential loop from 
a parallel loop inside of a workshare or workdistribute construct, we already 
have `fir.do_loop` and `omp.wsloop + omp.loop_nest`. If I'm not wrong, this PR 
prepares the `genLoopNest` function to later specify when to produce a parallel 
or a sequential loop inside of a workshare construct so that you can create 
`omp.workshare.loop_wrapper` or `fir.do_loop`. What I'm saying is that we can 
just use `omp.wsloop` in place of the first because it already has the meaning 
of "share iterations of the following loop across threads in the current team 
of threads". And that team of threads is defined by the parent `omp.parallel`, 
rather than `omp.workshare`. I can't think of a semantic difference between 
encountering `omp.wsloop` directly nested inside of `omp.parallel` vs nested 
inside of an `omp.workshare` nested inside of `omp.parallel`. What changes is 
everything else, which in the second case is functionally equivalent to being 
inside of an `omp.single`. Does that make sense or am I still missing something?

https://github.com/llvm/llvm-project/pull/101445
_______________________________________________
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits

Reply via email to