================
@@ -951,6 +951,108 @@ Open Questions / Future Developments
 4. Offload support might be extended to cases where the ``parallel_policy`` is
    used for some or all targets.
 
+Profile-Guided Optimization for Device Code
+===========================================
+
+Clang supports IR-level profile-guided optimization (PGO) for HIP device
+code on AMD GPUs. ``-fprofile-generate`` instruments both host and
+device code; running the instrumented binary writes separate host and
+device raw profiles, which are merged independently and consumed by a
+second build that passes the appropriate profile to each side.
+
+Prerequisites
+-------------
+
+The toolchain must be built with the AMDGPU profile runtime enabled,
+which requires building ``compiler-rt`` for the ``amdgcn-amd-amdhsa``
+target via the runtimes build. A minimal CMake configuration is:
+
+.. code-block:: console
+
+   $ cmake <llvm-project>/llvm \
+       -DLLVM_ENABLE_PROJECTS='clang;lld' \
+       -DLLVM_ENABLE_RUNTIMES=compiler-rt \
+       -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa' \
+       
-DRUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES=<llvm-project>/compiler-rt/cmake/caches/AMDGPU.cmake
 \
+       -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES='compiler-rt;libc' \
+       -DRUNTIMES_amdgcn-amd-amdhsa_RUNTIMES_USE_LIBC=llvm-libc
+
+``COMPILER_RT_BUILD_PROFILE_ROCM`` controls building the host-side
+ROCm/HIP device profile collection runtime, ``clang_rt.profile_rocm``.
+It is on by default for normal Linux and Windows compiler-rt builds,
+and off for bare-metal profile builds and unsupported hosts; leave it
----------------
jmmartinez wrote:

```suggestion
and `OFF` for bare-metal profile builds and unsupported hosts; leave it
```

https://github.com/llvm/llvm-project/pull/200208
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to