[PATCH] D82470: [OpenMP][IRBuilder] Support allocas in nested parallel regions

Fady Ghanim via Phabricator via cfe-commits Fri, 03 Jul 2020 01:36:15 -0700

fghanim added inline comments.


================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:282
+  IRBuilder<> AllocaBuilder;
+
   /// Map to remember source location strings
----------------
jdoerfert wrote:
> fghanim wrote:
> > jdoerfert wrote:
> > > fghanim wrote:
> > > > jdoerfert wrote:
> > > > > fghanim wrote:
> > > > > > What's the benefit of this over just maintaining an alloca 
> > > > > > insertion point?
> > > > > > 
> > > > > > With this we will have 3 `IRBuilder`s to maintain and keep track 
> > > > > > of: the clang (or flang) `IRBuilder`, the OMP `IRBuilder` and the 
> > > > > > `allocaBuilder`
> > > > > No functional difference. Basically the same reason why clang has an 
> > > > > AllocIRBuilder (I think), we can use it independently of the other 
> > > > > one. The alternative is to save the builder IP, set it to the alloca 
> > > > > IP, do alloca stuff, reset the builder IP, when you use the 
> > > > > AllocaIRBuilder below. Again, no functional difference, just less 
> > > > > setting/resetting of IPs.
> > > > I understand where you are coming from, and I understand that 
> > > > functionally there is mostly no change - for now. But I prefer we do a 
> > > > small struct with some RAII, over adding an entire builder. I think 
> > > > it's a bit overkill and will be a source to lots of trouble.
> > > > 
> > > > Clang maintains an instruction `AllocaInsertPt`, not a specific 
> > > > builder. Speaking of which, This is completely from memory (i.e. check 
> > > > to make sure); I think we already save the previous `AllocaInsertionPt` 
> > > > in clang as part of the `bodyGenCB`. In which case, this is not needed; 
> > > > a nested `parallel` will always be generated by clang as part of the 
> > > > `bodyGenCB`, which in turn is going to save the Alloca insertion point 
> > > > for us. 
> > > > To enable the creation of the allocas for `TID` and `ZeroAddr` in the 
> > > > outer function, Why not pass current `AllocaInsertionPt` as an arg. to 
> > > > `createParallel`
> > > > I understand where you are coming from, and I understand that 
> > > > functionally there is mostly no change - for now. But I prefer we do a 
> > > > small struct with some RAII, over adding an entire builder. I think 
> > > > it's a bit overkill and will be a source to lots of trouble.
> > > 
> > > Could you explain what kind of overkill you see here? And maybe also the 
> > > lots of trouble you expect from a builder versus a insertion point? It is 
> > > the same information, just packed in a different struct, right?
> > > 
> > > 
> > > > Clang maintains an instruction AllocaInsertPt, not a specific builder. 
> > > 
> > > Right.
> > > 
> > > 
> > > > Speaking of which, This is completely from memory (i.e. check to make 
> > > > sure); I think we already save the previous AllocaInsertionPt in clang 
> > > > as part of the bodyGenCB. 
> > > 
> > > I think so.
> > > 
> > > 
> > > > In which case, this is not needed; 
> > > 
> > > Not strictly, no. 
> > > 
> > > > a nested parallel will always be generated by clang as part of the 
> > > > bodyGenCB,
> > > 
> > > Yes.
> > > 
> > > > which in turn is going to save the Alloca insertion point for us.
> > > 
> > > Yes.
> > > 
> > > > To enable the creation of the allocas for TID and ZeroAddr in the outer 
> > > > function, Why not pass current AllocaInsertionPt as an arg. to 
> > > > createParallel
> > > 
> > > So far, because this will not be the only one that needs to create 
> > > allocas. If you want to add the allocaIP to all runtime calls that create 
> > > allocas, I'm fine with that too.
> > > Could you explain what kind of overkill you see here?
> > Using an IRBuilder when an insertion point suffices
> > 
> > > And maybe also the lots of trouble you expect from a builder versus a 
> > > insertion point?
> > Currently, when using the OMPBuilder you need to juggle two IRBuilders, 
> > let's not make them 3 :)
> > 
> > > It is the same information, just packed in a different struct, right?
> > no it is not. `InsetrionPoint` vs  struct that contains `Insertionpoint` + 
> > other things. But that's beside the point
> > 
> > I mean, is the `AllocaBuilder` providing a new functionality other than a 
> > convenient way to keep track of the Alloca `InsertionPoint`?
> > Personally, I prefer some suitable structure to your needs and a stack 
> > (i.e. the way clang does it) . This should also resolve the todo below. 
> > Alternatively, pass an allocation point as an argument as suggested 
> > earlier. I am open to any third way if you have any.
> > Personally, I prefer some suitable structure to your needs and a stack 
> > (i.e. the way clang does it).
> 
> The stack is already here, implicitly, and actually on the stack. With
> `IRBuilder<>::InsertPointGuard AIPG(AllocaBuilder);`
> we have all the stack we need without leaking the state.
> The updates to the insertion point are done for use and the AllocaBuilder is 
> always ready to be used. No need to create a builder or update an insertion 
> point (other than the places that create the new alloca insertion points). 
> Note that the suitable structure is the builder, as it is the only 
> interaction with the insertion point, e.g. we do not pass the insertion point 
> to other functions.
> 
> 
> > InsetrionPoint vs struct that contains Insertionpoint + other things. But 
> > that's beside the point
> 
> I disagree. The "other things" is the only interaction we have with the 
> insertion point. So storing or accepting an insertion point is fine for the 
> only purpose of creating a Builder and interacting with the builder. The 
> difference is that we need to add more churn to update and reset the 
> insertion point (what happens behind the scenes with the one line I quoted 
> above).
> 
> 
> > Alternatively, pass an allocation point as an argument as suggested earlier.
> 
> I did not go this route because I think it complicates the API with little 
> gain. I was also unsure what the flang situation looks like here. Do we have 
> an alloca insertion point there too?
> At the end of the day, this makes it easier for the user.
But we haven't solved a problem, we just kicked it down the road with the TODO 
you added, with the only trade-off being convenience now. The `AllocaBuilder` 
is not usable by the frontend - the frontend doesn't & **shouldn't** know or 
care about any builders other than its own. However, an insertion point can be 
passed to the frontend (i.e. `bodyCB`). So keeping track of an Alloca insertion 
point across nests -similar to how clang does it- is what we need. I still 
prefer having a stack to push to - maybe even piggy back on the finalization 
stack by creating a struct that contains all relevant info per nest level. But 
you don't like it, fine.

Then just pass an argument for allocaIP and be done with it. Flang (or clang or 
whichever other frontend) are required to emit allocas into function entry 
block by all LLVM passes that follow. As such, we can safely assume the 
outermost call is always going to have the AllocaIP in the entry block of the 
enclosing function. Which means the first time we call `CreateParallel` they 
can pass that as arg.. or pass an empty insertion point, in which case we 
already detect it.

Adding a builder for every insertion point that we need to pass across nests or 
regions is not something we want to do for the reasons I mentioned earlier. 
Whether we like it or not, it is a builder we have to juggle along the other 
two, no matter how much we try to specialize its use, or keep it "hidden".



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82470/new/

https://reviews.llvm.org/D82470



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D82470: [OpenMP][IRBuilder] Support allocas in nested parallel regions

Reply via email to