[PATCH] D101630: [HIP] Fix device-only compilation

Yaxun Liu via Phabricator via cfe-commits Fri, 04 Jun 2021 06:37:02 -0700

yaxunl added a comment.

In D101630#2792160 <https://reviews.llvm.org/D101630#2792160>, @tra wrote:


> In D101630#2792052 <https://reviews.llvm.org/D101630#2792052>, @yaxunl wrote:
>
>> I think for intermediate outputs e.g. preprocessor expansion, IR, and 
>> assembly, probably it makes sense not to bundle by default.
>
> Agreed.
>
>> However, for default action (emitting object), we need to bundle by default 
>> since it was the old behavior and existing HIP apps depend on that.
>
> Existing use is a valid point.
> As a counterargument, I would suggest that in a compilation pipeline which 
> does include bundling, an object file for one GPU variant *is* an 
> intermediate output, similar to the ones you've listed above.
>
> The final product of device-side subcompilations is a bundle. The question is 
> `what does "-c" mean?`.  Is it `produce an object file` or `compile till the 
> end of the pipeline` ? 
> For CUDA and HIP compilation it's ambiguous. When we target just one GPU, it 
> would be closer to the former. In general, it would be closer to the latter. 
> NVCC side-steps the issue by using a different flags `-cubin/-fatbin` to 
> disambiguate between two cases and avoid bolting on CUDA-related semantics on 
> the compiler flags that were not designed for that.
>
>> Then we allow -fhip-bundle-device-output to override the default behavior.
>
> OK. Bundling objects for HIP by default looks like a reasonable compromise. 
> It would be useful to generalize the flag to `-fgpu-bundle...` as it would be 
> useful if/when we want to produce a fatbin during CUDA compilation. I'd still 
> keep no-bundling as the default for CUDA's objects.
>
> Now that we are in agreement of what we want, the next question is *how* we 
> want to do it.
>
> It appears that there's a fair bit of similarity between what the proposed 
> `-fgpu-bundle` flag does and the handful of `--emit-...` options clang has 
> now.
> If we were to use something like `--emit-gpu-object` and `--emit-gpu-bundle`, 
> it would be similar to NVCC's `-cubin/-fatbinary`, would decouple the default 
> behavior for `-c --cuda-device-only` from the user's ability to specify what 
> they want without burdening `-c` with additional flags that would have 
> different defaults under different circumstances.
>
> Compilation with "-c" would remain the "compile till the end", whatever it 
> happens to mean for particular language and `--emit-object/bundle` would tell 
> the compiler how far we want it to proceed and what kind of output we want. 
> This would probably be easier to explain to the users as they are already 
> familiar with flags like `-emit-llvm`, only now we are dealing with an extra 
> bundling step in the compilation pipeline. It would also behave consistently 
> across CUDA and HIP even though they have different defaults for bundling for 
> the device-side compilation. E.g. `-c --cuda-device-only --emit-gpu-bundle` 
> will always produce a bundle with the object files for both CUDA and HIP and 
> `-c --cuda-device-only --emit-gpu-object` will always require single '-o' 
> output.
>
> WDYT? Does it make sense?

For sure we will need -fgpu-bundle-device-output to control bundling of 
intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be 
redundant and can cause confusion. What if users specify `-c 
-fgpu-bundle-device-output -emit-gpu-object` or `-c 
-fno-gpu-bundle-device-output -emit-gpu-bundle`? To me a single option 
-fgpu-bundle-device-output to control all device output seems cleaner.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101630/new/

https://reviews.llvm.org/D101630

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D101630: [HIP] Fix device-only compilation

Reply via email to