https://github.com/yxsamliu updated https://github.com/llvm/llvm-project/pull/200197
>From dadbe14ddf43b631addae3940ac99825b26d5f19 Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" <[email protected]> Date: Thu, 28 May 2026 11:11:01 -0400 Subject: [PATCH 1/2] [docs][HIP] Document offload PGO workflow Add a section to HIPSupport.rst describing IR-level profile-guided optimization for HIP device code. -fprofile-generate instruments both host and device; the runtime writes a host .profraw and one set of device .profraw files per --offload-arch= value, with the standard LLVM_PROFILE_FILE substitutions applying to both. Host and each per-architecture device profile are merged independently with llvm-profdata, and the use-phase build feeds them back via -Xarch_host -fprofile-use= and -Xarch_<gpu-arch> -fprofile-use= (with -Xarch_device as a single-arch shorthand). Also add a CUDA/HIP Language Changes entry in ReleaseNotes.rst. --- clang/docs/HIPSupport.rst | 102 ++++++++++++++++++++++++++++++++++++ clang/docs/ReleaseNotes.rst | 7 +++ 2 files changed, 109 insertions(+) diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst index 82070a4042679..99559548823b2 100644 --- a/clang/docs/HIPSupport.rst +++ b/clang/docs/HIPSupport.rst @@ -951,6 +951,108 @@ Open Questions / Future Developments 4. Offload support might be extended to cases where the ``parallel_policy`` is used for some or all targets. +Profile-Guided Optimization for Device Code +=========================================== + +Clang supports IR-level profile-guided optimization (PGO) for HIP device +code on AMD GPUs. ``-fprofile-generate`` instruments both host and +device code; running the instrumented binary writes separate host and +device raw profiles, which are merged independently and consumed by a +second build that passes the appropriate profile to each side. + +Prerequisites +------------- + +The toolchain must be built with the AMDGPU profile runtime enabled, +which requires building ``compiler-rt`` for the ``amdgcn-amd-amdhsa`` +target via the runtimes build. A minimal CMake configuration is: + +.. code-block:: console + + $ cmake <llvm-project>/llvm \ + -DLLVM_ENABLE_PROJECTS='clang;lld' \ + -DLLVM_ENABLE_RUNTIMES=compiler-rt \ + -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa' \ + -DRUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES=<llvm-project>/compiler-rt/cmake/caches/AMDGPU.cmake \ + -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='compiler-rt;libc' \ + -DRUNTIMES_amdgcn-amd-amdhsa_RUNTIMES_USE_LIBC=llvm-libc + +``COMPILER_RT_BUILD_PROFILE_ROCM`` controls building the host-side +ROCm/HIP device profile collection runtime, ``clang_rt.profile_rocm``. +It is on by default for normal Linux and Windows compiler-rt builds, +and off for bare-metal profile builds and unsupported hosts; leave it +enabled. ``RUNTIMES_USE_LIBC=llvm-libc`` is required so the amdgcn +profile compile picks up LLVM-libc's ``-isystem`` / ``-nostdlibinc`` +headers. + +Generate phase +-------------- + +The driver forwards ``-fprofile-generate`` to the device compiler and +links the device profile runtime into the embedded device image. + +.. code-block:: console + + $ clang++ -x hip demo.hip \ + --offload-arch=gfx1100 --offload-arch=gfx1101 \ + -fprofile-generate=pgo_data \ + -o demo.instr + + $ ./demo.instr + +When the instrumented binary exits, the runtime writes raw profile +files into ``pgo_data/``. Host profiles use the standard LLVM profile +filename; device profiles use the same filename with the GPU +architecture name prepended to the basename, so each +``--offload-arch=`` value produces its own set of device files. The +usual ``LLVM_PROFILE_FILE`` substitutions (``%p`` for process ID, +``%m`` for binary signature, etc.) apply to both, so multi-process +runs do not need a separate device-side naming scheme. + +Merge the host profile and each device architecture's profile +separately: + +.. code-block:: console + + $ llvm-profdata merge -o host.profdata pgo_data/default_*.profraw + $ llvm-profdata merge -o device.gfx1100.profdata pgo_data/gfx1100*.profraw + $ llvm-profdata merge -o device.gfx1101.profdata pgo_data/gfx1101*.profraw + +Use phase +--------- + +Host and device compilations consume different profiles, and each GPU +architecture consumes its own. ``-Xarch_host`` selects the host +profile and ``-Xarch_<gpu-arch>`` selects the per-architecture device +profile: + +.. code-block:: console + + $ clang++ -x hip demo.hip \ + --offload-arch=gfx1100 --offload-arch=gfx1101 \ + -Xarch_host -fprofile-use=host.profdata \ + -Xarch_gfx1100 -fprofile-use=device.gfx1100.profdata \ + -Xarch_gfx1101 -fprofile-use=device.gfx1101.profdata \ + -o demo + +For a single-arch build, ``-Xarch_device`` is a convenient shorthand +that applies the same profile to every offload architecture: + +.. code-block:: console + + $ clang++ -x hip demo.hip --offload-arch=gfx1101 \ + -Xarch_host -fprofile-use=host.profdata \ + -Xarch_device -fprofile-use=device.gfx1101.profdata \ + -o demo + +Notes +----- + +- The instrumented build is slower than a normal build; only the use + phase produces the optimized binary intended for deployment. +- Set ``LLVM_PROFILE_VERBOSE=1`` to print runtime diagnostics for + profile file creation and device profile collection. + SPIR-V Support on HIPAMD ToolChain ================================== diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index a984982d1bd41..d241132a4e9bf 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -901,6 +901,13 @@ CUDA/HIP Language Changes - The new offloading driver is now the default for HIP. Use `--no-oflfoad-new-driver` to return to the old behavior. +- Added IR-level profile-guided optimization (PGO) support for HIP + device code on AMD GPUs. ``-fprofile-generate`` now instruments both + host and device; running the instrumented binary writes host and + per-GPU-architecture device raw profiles, which are merged separately + with ``llvm-profdata`` and fed back via ``-Xarch_host`` / + ``-Xarch_<gpu-arch>`` ``-fprofile-use=``. See :doc:`HIPSupport` for + the full workflow. CUDA Support ^^^^^^^^^^^^ >From d174b9afd99fc07a3a7c5832d175f1a8cd383707 Mon Sep 17 00:00:00 2001 From: "Yaxun (Sam) Liu" <[email protected]> Date: Thu, 28 May 2026 10:36:57 -0400 Subject: [PATCH 2/2] [docs][HIP] Document source-based device code coverage workflow Add a section to HIPSupport.rst describing how to produce source-based code coverage reports for HIP device code on AMD GPUs: compile with -fprofile-instr-generate -fcoverage-mapping, extract the device ELF from the host binary's .hip_fatbin section, unbundle with clang-offload-bundler using the hip-amdgcn-amd-amdhsa--<arch> target ID, and run llvm-profdata / llvm-cov against the device object. --- clang/docs/HIPSupport.rst | 53 +++++++++++++++++++++++++++++++++++++ clang/docs/ReleaseNotes.rst | 6 +++++ 2 files changed, 59 insertions(+) diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst index 99559548823b2..940598d04e346 100644 --- a/clang/docs/HIPSupport.rst +++ b/clang/docs/HIPSupport.rst @@ -1053,6 +1053,59 @@ Notes - Set ``LLVM_PROFILE_VERBOSE=1`` to print runtime diagnostics for profile file creation and device profile collection. +Source-Based Code Coverage for Device Code +========================================== + +Clang supports source-based code coverage for HIP device code on AMD GPUs. +Device code is instrumented with the same ``-fprofile-instr-generate +-fcoverage-mapping`` flags used for host code; counters live in the device +binary, are written to a ``.profraw`` file at process exit, and can be +consumed by ``llvm-profdata`` and ``llvm-cov``. + +Prerequisites +------------- + +Source-based device coverage relies on the AMDGPU profile runtime, so +the toolchain must be built with the same CMake configuration used for +HIP offload PGO. See the *Prerequisites* subsection under +`Profile-Guided Optimization for Device Code`_. + +Example +------- + +Given a HIP program ``demo.hip``, the following commands produce an LCOV +report covering device code: + +.. code-block:: console + + $ clang++ -x hip demo.hip \ + --offload-arch=gfx1101 \ + -fprofile-instr-generate -fcoverage-mapping \ + -o demo + + $ llvm-objcopy --dump-section=.hip_fatbin=fatbin.bin demo + $ clang-offload-bundler --type=o --input=fatbin.bin \ + --output=device.gfx1101.o \ + --targets=hip-amdgcn-amd-amdhsa--gfx1101 --unbundle + + $ LLVM_PROFILE_FILE="cov.%p.profraw" ./demo + $ llvm-profdata merge -sparse -o cov.profdata cov.*.profraw + + $ llvm-cov report device.gfx1101.o -instr-profile=cov.profdata + $ llvm-cov show device.gfx1101.o -instr-profile=cov.profdata + $ llvm-cov export device.gfx1101.o -instr-profile=cov.profdata \ + -format=lcov > coverage.lcov + +The device ELF is extracted from the ``.hip_fatbin`` section of the host +binary and then unbundled with ``clang-offload-bundler``. The unbundle +target string uses the bundle ID ``hip-amdgcn-amd-amdhsa--<arch>``, +which is the offload kind (``hip``) followed by the standard +four-component target triple (``amdgcn-amd-amdhsa-``, with the empty +environment field giving the trailing dash) and then the target ID +(``<arch>``). See :doc:`ClangOffloadBundler` for the full bundle entry +ID grammar. ``llvm-cov`` is invoked against the device object because +the coverage mapping for device functions is emitted there. + SPIR-V Support on HIPAMD ToolChain ================================== diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst index d241132a4e9bf..96b8b565787eb 100644 --- a/clang/docs/ReleaseNotes.rst +++ b/clang/docs/ReleaseNotes.rst @@ -908,6 +908,12 @@ CUDA/HIP Language Changes with ``llvm-profdata`` and fed back via ``-Xarch_host`` / ``-Xarch_<gpu-arch>`` ``-fprofile-use=``. See :doc:`HIPSupport` for the full workflow. +- Added source-based code coverage support for HIP device code on AMD + GPUs. ``-fprofile-instr-generate -fcoverage-mapping`` now instruments + device code; running the instrumented binary writes per-GPU + architecture raw profiles that can be merged with ``llvm-profdata`` + and reported by ``llvm-cov`` against the extracted device code + object. See :doc:`HIPSupport` for the full workflow. CUDA Support ^^^^^^^^^^^^ _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
