[PATCH] D145815: [Flang][Driver] Add support for fopenmp-is-device and fembed-offload-object to Flang ToolChain

Andrew Gozillon via Phabricator via cfe-commits Thu, 16 Mar 2023 17:01:43 -0700

agozillon added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/Flang.cpp:128
+  if (IsHostOffloadingAction) {
+    for (size_t i = 1; i < Inputs.size(); ++i) {
+      if (Inputs[i].getType() != types::TY_Nothing)
----------------
awarzynski wrote:
> jhuber6 wrote:
> > agozillon wrote:
> > > jhuber6 wrote:
> > > > agozillon wrote:
> > > > > awarzynski wrote:
> > > > > > agozillon wrote:
> > > > > > > awarzynski wrote:
> > > > > > > > What's the magic "1"? And given that the input count matters 
> > > > > > > > here - is there a test with multiple inputs?
> > > > > > > It aims to mimic the behavior of Clang: 
> > > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L4561
> > > > > > >  where the main input is skipped (the input currently being 
> > > > > > > compiled or embedded into etc.), when adding to 
> > > > > > > //-fembed-offload-object//. 
> > > > > > > 
> > > > > > > It does look different to Clang's as Clang has more cases and the 
> > > > > > > logic is spread across the constructJob invocation, but the first 
> > > > > > > if case is what the if statement inside of the loop and setting 
> > > > > > > the loop index variable to 1 do. The HostOffloadingInputs array 
> > > > > > > is what is being generated here, except we're skipping and 
> > > > > > > directly applying it as arguments.
> > > > > > > 
> > > > > > > I tried to condense it a little in this case! Perhaps it loses 
> > > > > > > readability though, I had hoped the comment might have kept it 
> > > > > > > clear
> > > > > > Thanks for the link - that code in Clang doesn't really clarify 
> > > > > > what makes `Inputs[0]` special 🤔 . 
> > > > > > 
> > > > > > Let me rephrase my question - what's so special about the first 
> > > > > > input? (referred to in Clang as "main input") Is that something 
> > > > > > specific to OpenMP? For example, in this case:
> > > > > > ```
> > > > > > flang-new  -fopenmp  file.f90
> > > > > > ```
> > > > > > I assume that `inputs[0]` is "file.f90", so nothing will happen?
> > > > > > 
> > > > > > > I tried to condense it a little in this case! Perhaps it loses 
> > > > > > > readability though, I had hoped the comment might have kept it 
> > > > > > > clear
> > > > > > 
> > > > > > Nah, I think that your implementation is fine. It's my ignorance 
> > > > > > with respect to OpenMP that's the problem here ;-)
> > > > > It's not specific to OpenMP I believe, as far as I am aware Clang's 
> > > > > supported offload models (SYCL and CUDA as well as OpenMP) use it! In 
> > > > > Flang's case we only really care about OpenMP as I believe it's the 
> > > > > only offload programming model supported.
> > > > > 
> > > > > In the case of the command: 
> > > > > 
> > > > > ```
> > > > > flang-new -fopenmp file.f90
> > > > > ``` 
> > > > > The code should never be executed as no part of the command will 
> > > > > enable an offloading action (for device or host)! But yes inputs[0] 
> > > > > would indeed refer to file.f90.
> > > > > 
> > > > > However, this code becomes relevant when you utilise an option that 
> > > > > enables the clangDriver to perform some form of offloading action. 
> > > > > For example a command like: 
> > > > > 
> > > > > ```
> > > > > flang-new -fopenmp --offload-arch=gfx90a file.f90 
> > > > > ```
> > > > > Will trigger two phase compilation, one for the host device (your 
> > > > > resident CPU in this command) and one for the device (gfx90a in this 
> > > > > command), the regular host pass will invoke like your provided 
> > > > > command and the device pass will then invoke with -fopenmp-is-device 
> > > > > in addition alongside the device triple. This generates two bitcode 
> > > > > files from the one file, one containing the host code from the file, 
> > > > > the other the device code (extracted from OpenMP target regions or 
> > > > > declare target etc.). 
> > > > > 
> > > > > However, now we have two files, with both parts of our program, we 
> > > > > need to conjoin them together, the clangDriver generates an 
> > > > > embeddable fat-binary/binary using the clang-offload-packager and 
> > > > > then invokes flang-new again, and this is where the above code 
> > > > > becomes relevant, as this binary (or multiple binaries, if we target 
> > > > > multiple devices in the same program) is what is passed to 
> > > > > -fembed-offload-object! And inputs[0] in this case would refer to the 
> > > > > output from the original host pass, which is what we want to embed 
> > > > > the device binary into, so we wish to skip this original host output 
> > > > > and only pass the subsequent inputs (which should be device binaries 
> > > > > when the clangDriver initiates a host offloading action) we want to 
> > > > > embed as -fembed-offload-object arguments. 
> > > > > 
> > > > > The offloading driver is quite complex and my knowledge of it is not 
> > > > > perfect as I am not our resident expert on it unfortunately (so if 
> > > > > anyone sees anything incorrect, please do chime in and correct me)! 
> > > > > 
> > > > > But hopefully this answers your question and gives you an idea of 
> > > > > when and how this -fembed-offload-object comes into play, essentially 
> > > > > a way to get the device binaries we wish to insert into the host 
> > > > > binary, so it can load the binaries at runtime and execute them. 
> > > > > Currently upstream Flang doesn't utilise this option of course, but 
> > > > > we intend to use this as part of our OpenMP offloading efforts for 
> > > > > AMD devices (whilst leaving the door open for other vendors devices 
> > > > > as well). We are trying to re-use/mimic as much of the existing 
> > > > > machinery that the clangDriver implements as we can! 
> > > > >  
> > > > The compiler requires at least one input file to run, otherwise it 
> > > > exits early. Therefore we're guaranteed to have at least one input file 
> > > > in the list. Some functions need an input file, usually to write some 
> > > > temp name to, and `Inputs[0]` is the easiest way to get an input file.
> > > Thank you very much @jhuber6! I should have added you as a 
> > > subscriber/reviewer as well in hindsight, sorry about that. 
> > Sorry, this is two separate things. I was thinking about the driver's input 
> > list which behaves like I mentioned above.
> > 
> > Here is the input list to an actual compilation job. In this case we expect 
> > to only get one. E.g. `clang foo.c bar.c` gets turned into `clang -cc1 
> > foo.c` and `clang -cc1 bar.c`. The extra files added here are intended to 
> > be used as additional input handled separately. So, for example consider 
> > the following OpenMP offloading compilation.
> > 
> > ```
> > > clang input.c -fopenmp --offload-arch=gfx1030 -ccc-print-bindings
> > # "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: 
> > "/tmp/input-44c8b5.bc"
> > # "amdgcn-amd-amdhsa" - "clang", inputs: ["input.c", 
> > "/tmp/input-44c8b5.bc"], output: "/tmp/input-gfx1030-4471d9.bc"
> > # "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: 
> > ["/tmp/input-gfx1030-4471d9.bc"], output: "/tmp/input-83327d.out"
> > # "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-44c8b5.bc", 
> > "/tmp/input-83327d.out"], output: "/tmp/input-cdd693.o"
> > # "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: 
> > ["/tmp/input-cdd693.o"], output: "a.out"
> > ```
> > The first input file is the actual file meant to be compiled. The other 
> > files are handled separately. For the `amdgcn-amd-amdhsa` triple the extra 
> > input is the host bitcode we use to match symbols between the host and the 
> > device. For the `x64` triple the extra input is a binary blob to be 
> > embedded into the host.
> Thanks for this thorough explanation!
> 
> > We are trying to re-use/mimic as much of the existing machinery that the 
> > clangDriver implements as we can!
> 
> That makes a ton of sense 👍🏻 ! It would be good to add a note in the summary 
> (perhaps with a link to https://clang.llvm.org/docs/OffloadingDesign.html).
> 
> So basically `inputs[0]` is the host bitcode file, and this method is for 
> device files, right? Perhaps you could replace "primary input" with something 
> a bit more descriptive?
> 
> Btw, would you be willing to update 
> https://github.com/llvm/llvm-project/blob/main/flang/docs/FlangDriver.md with 
> some notes on offloading? Clang's documentation is very C-centric. Definitely 
> not in this patch ;-)
Happy to modify the summary, and also happy to update the comment to be a 
little more descriptive.

As for the documentation notes, I would be happy to eventually, when we've 
perhaps got an initial end-to-end flow fully integrated into Flang (or further 
on it's way)! There may be a better candidate to do so however, but if no one 
wishes too, I'd be more than happy to caretake it's addition.  


================
Comment at: flang/test/Driver/omp-frontend-forwarding.f90:1
+! REQUIRES: amdgpu-registered-target
+
----------------
awarzynski wrote:
> jhuber6 wrote:
> > agozillon wrote:
> > > awarzynski wrote:
> > > > Given that you use `-###`, I think that this can be skipped (please 
> > > > double check).
> > > It does appear that it can be, at the very least I can swap in an NVIIDIA 
> > > arch when I haven't configured the project to target it and it has no 
> > > issues! Thank you. 
> > I'm not completely familiar with Flangs status on this, do we have tests in 
> > tree that perform the entire build and check `-ccc-print-bindings/phases` 
> > like we do in Clang?
> > I'm not completely familiar with Flangs status on this
> 
> I don't recall any other efforts to support offloading in Flang's compiler 
> driver - very nice to see this being worked on!
No tests for that at the moment as far as offloading is concerned in upstream 
Flang. The full build process is still a WIP. At the moment this embed flag is 
ignored in upstream Flang and we still have a lot of progress downstream to 
make too. I think @jsjodin 
or @skatrak may have a better big picture view than I do as they did the 
initial offload driver work/research I think.

There are however, some ccc-print-phase tests for other components inside of 
the phases.f90 test file. If we wish to test the currently generated phases 
similarly to the command you showed previously, I would be more than happy to 
add it if you wish?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145815/new/

https://reviews.llvm.org/D145815

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D145815: [Flang][Driver] Add support for fopenmp-is-device and fembed-offload-object to Flang ToolChain

Reply via email to