yaxunl added a comment. In D101630#2792160 <https://reviews.llvm.org/D101630#2792160>, @tra wrote:
> In D101630#2792052 <https://reviews.llvm.org/D101630#2792052>, @yaxunl wrote: > >> I think for intermediate outputs e.g. preprocessor expansion, IR, and >> assembly, probably it makes sense not to bundle by default. > > Agreed. > >> However, for default action (emitting object), we need to bundle by default >> since it was the old behavior and existing HIP apps depend on that. > > Existing use is a valid point. > As a counterargument, I would suggest that in a compilation pipeline which > does include bundling, an object file for one GPU variant *is* an > intermediate output, similar to the ones you've listed above. > > The final product of device-side subcompilations is a bundle. The question is > `what does "-c" mean?`. Is it `produce an object file` or `compile till the > end of the pipeline` ? > For CUDA and HIP compilation it's ambiguous. When we target just one GPU, it > would be closer to the former. In general, it would be closer to the latter. > NVCC side-steps the issue by using a different flags `-cubin/-fatbin` to > disambiguate between two cases and avoid bolting on CUDA-related semantics on > the compiler flags that were not designed for that. > >> Then we allow -fhip-bundle-device-output to override the default behavior. > > OK. Bundling objects for HIP by default looks like a reasonable compromise. > It would be useful to generalize the flag to `-fgpu-bundle...` as it would be > useful if/when we want to produce a fatbin during CUDA compilation. I'd still > keep no-bundling as the default for CUDA's objects. > > Now that we are in agreement of what we want, the next question is *how* we > want to do it. > > It appears that there's a fair bit of similarity between what the proposed > `-fgpu-bundle` flag does and the handful of `--emit-...` options clang has > now. > If we were to use something like `--emit-gpu-object` and `--emit-gpu-bundle`, > it would be similar to NVCC's `-cubin/-fatbinary`, would decouple the default > behavior for `-c --cuda-device-only` from the user's ability to specify what > they want without burdening `-c` with additional flags that would have > different defaults under different circumstances. > > Compilation with "-c" would remain the "compile till the end", whatever it > happens to mean for particular language and `--emit-object/bundle` would tell > the compiler how far we want it to proceed and what kind of output we want. > This would probably be easier to explain to the users as they are already > familiar with flags like `-emit-llvm`, only now we are dealing with an extra > bundling step in the compilation pipeline. It would also behave consistently > across CUDA and HIP even though they have different defaults for bundling for > the device-side compilation. E.g. `-c --cuda-device-only --emit-gpu-bundle` > will always produce a bundle with the object files for both CUDA and HIP and > `-c --cuda-device-only --emit-gpu-object` will always require single '-o' > output. > > WDYT? Does it make sense? For sure we will need -fgpu-bundle-device-output to control bundling of intermediate files. Then adding -emit-gpu-object and -emit-gpu-bundle may be redundant and can cause confusion. What if users specify `-c -fgpu-bundle-device-output -emit-gpu-object` or `-c -fno-gpu-bundle-device-output -emit-gpu-bundle`? To me a single option -fgpu-bundle-device-output to control all device output seems cleaner. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D101630/new/ https://reviews.llvm.org/D101630 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits